- introduce a way to compute union using symbolic rectangular bounding boxes
- handle multiple load/store op's to the same memref by taking a union of the regions
- command-line argument to provide capacity of the fast memory space
- minor change to replaceAllMemRefUsesWith to not generate affine_apply if the
supplied index remap was identity
PiperOrigin-RevId: 230848185
canonicalizations of operations. The ultimate important user of this is
going to be a funcBuilder->foldOrCreate<YourOp>(...) API, but for now it
is just a more convenient way to write certain classes of canonicalizations
(see the change in StandardOps.cpp).
NFC.
PiperOrigin-RevId: 230770021
- switch some debug info to emitError
- use a single constant op for zero index to make it easier to write/update
test cases; avoid creating new constant op's for common zero index cases
- test case cleanup
This is in preparation for an upcoming major update to this pass.
PiperOrigin-RevId: 230728379
Example inline notation:
trailing-location ::= 'loc' '(' location ')'
// FileLineCol Location.
%1 = "foo"() : () -> i1 loc("mysource.cc":10:8)
// Name Location
return loc("foo")
// CallSite Location
return loc(callsite("foo" at "mysource.cc":19:9))
// Fused Location
/// Without metadata
func @inline_notation() loc(fused["foo", "mysource.cc":10:8])
/// With metadata
return loc(fused<"myPass">["foo", "foo2"])
// Unknown location.
return loc(unknown)
Locations are currently only printed with inline notation at the line of each instruction. Further work is needed to allow for reference notation, e.g:
...
return loc 1
}
...
loc 1 = "source.cc":10:1
PiperOrigin-RevId: 230587621
This CL just changes various docs and comments to use the term "generic" and
"custom" when mentioning assembly forms. To be consist, several methods are
also renamed:
* FunctionParser::parseVerboseOperation() -> parseGenericOperation()
* ModuleState::hasShorthandForm() -> hasCustomForm()
* OpAsmPrinter::printDefaultOp() -> printGenericOp()
PiperOrigin-RevId: 230568819
- update fusion cost model to fuse while tolerating a certain amount of redundant
computation; add cl option -fusion-compute-tolerance
evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC
PiperOrigin-RevId: 230541857
This CL adds the Return op to EDSCs types and emitter.
This allows generating full function bodies that can be compiled all the way
down to LLVMIR and executed on CPU.
At this point, the MLIR lacks the testing infrastructure to exercise this.
End-to-end testing of full functions written in EDSCs is left for a future CL.
PiperOrigin-RevId: 230527530
- unrolling a single iteration loop by a factor of one should promote its body
into its parent; this makes it consistent with the behavior/expectation that
unrolling a loop by a factor equal to its trip count makes the loop go away.
PiperOrigin-RevId: 230426499
- ForInst::walkOps will also be used in an upcoming CL (cl/229438679); better to have
this instead of deriving from the InstWalker
PiperOrigin-RevId: 230413820
- improve/fix doc comments for affine apply composition related methods.
- drop makeSingleValueComposedAffineApply - really redundant and out of line in
a public API; it's just returning the first result of the composed affine
apply op, and not making a single result affine map or an affine_apply op.
PiperOrigin-RevId: 230406169
- the size of the private memref created for the slice should be based on
the memref region accessed at the depth at which the slice is being
materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
to the region accessed based on the entire domain.
- leads to a significant contraction of the temporary / intermediate memref
whenever the memref isn't reduced to a single scalar (through store fwd'ing).
Other changes
- update to promoteIfSingleIteration - avoid introducing unnecessary identity
map affine_apply from IV; makes it much easier to write and read test cases
and pass output for all passes that use promoteIfSingleIteration; loop-fusion
test cases become much simpler
- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
'domInstFilter' could be one of the ops erased due to a memref replacement in
it.
- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
missing (the latter need not always be 1); add lbFloorDivisors output argument
- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape
PiperOrigin-RevId: 230405218
*) Do not remove loop nests which write to memrefs which escape the function.
*) Do not remove memrefs which escape the function (e.g. are used in the return instruction).
PiperOrigin-RevId: 230398630
Add default values to attributes, to allow attribute being left unspecified. The attr getter will always return an attribute so callers need not check for it, if the attribute is not set then the default will be returned (at present the default will be constructed upon query but this will be changed).
Add op definition for tf.AvgPool in ops.td, rewrite matcher using pattern using attribute matching & transforms. Adding some helper functions to make it simpler.
Handle attributes with dialect prefix and map them to getter without dialect prefix.
Note: VerifyAvgPoolOp could probably be autogenerated by know given the predicate specification on attributes, but deferring that to a follow up.
PiperOrigin-RevId: 230364857
Start doc generation pass that generates simple markdown output. The output is formatted simply[1] in markdown, but this allows seeing what info we have, where we can refine the op description (e.g., the inputs is probably redundant), what info is missing (e.g., the attributes could probably have a description).
The formatting of the description is still left up to whatever was in the op definition (which luckily, due to the uniformity in the .td file, turned out well but relying on the indentation there is fragile). The mechanism to autogenerate these post changes has not been added yet either. The output file could be run through a markdown formatter too to remove extra spaces.
[1]. This is not proposal for final style :) There could also be a discussion around single doc vs multiple (per dialect, per op), whether we want a TOC, whether operands/attributes should be headings or just formatted differently ...
PiperOrigin-RevId: 230354538
1) Fix FloatAttr type inconsistency in conversion from tf.FusedBatchNorm to TFLite ops
We used to compose the splat tensor out of the scalar epsilon attribute by using the
type of the variance operand. However, the epsilon attribute may have a different
bitwidth than the one in the variance operand. So it ends up we were creating
inconsistent types within the FloatAttr itself.
2) Fix SplatElementsAttr type inconsistency in AnnotateInputArrays
We need to create the zero-valued attribute according to the type provided as the
command-line arguments.
3) Concretize the result type of tf.Shape constant folding test case
Currently the resultant constant is created by the constant folding harness, using
the result type of the original op as the constant's result type. That can be
a different type than the constant's internal DenseElementsAttr.
PiperOrigin-RevId: 230244665
- print multiplication by -1 as unary negate; expressions like s0 * -1, d0 * -1
+ d1 will now appear as -s0, -d0 + d1 resp.
- a minor cleanup while on printAffineExprInternal
PiperOrigin-RevId: 230222151
This CL also makes ScopedEDSCContexts to reset the Bindable numbering when
creating a new context.
This is useful to write minimal tests that don't use FileCheck pattern
captures for now.
PiperOrigin-RevId: 230079997
This CL performs a bunch of cleanups related to EDSCs that are generally
useful in the context of using them with a simple wrapping C API (not in this
CL) and with simple language bindings to Python and Swift.
PiperOrigin-RevId: 230066505
*) Enables reduction of private memref size based on MemRef region accessed by fused slice.
*) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence.
*) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice).
PiperOrigin-RevId: 229936698
This CL adds a test reported by andydavis@ and fixes the corner case that
appears when operands do not come from an AffineApply and no Dim composition
is needed.
In such cases, we would need to create an empty map which is disallowed.
The composition in such cases becomes trivial: there is no composition.
This CL also updates the name AffineNormalizer to AffineApplyNormalizer.
PiperOrigin-RevId: 229819234
Change MinMaxAttr to match hasValidMinMaxAttribute behavior. Post rewriting the other users of that function it could be removed too. The currently generated error message is:
error: 'tfl.fake_quant' op attribute 'minmax' failed to satisfy constraint of MinMaxAttr
PiperOrigin-RevId: 229775631
This CL fixes a misunderstanding in how to build DimOp which triggered
execution issues in the CPU path.
The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to
construct the dynamic dimensions should be:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and
`dim %arg, 4 : memref<?x4x?x8x?xf32>`
Before this CL, we wold construct:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 1 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and expect the other dimensions to be constants.
This assumption seems consistent at first glance with the syntax of alloc:
```
%tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32>
```
But this was actuallyincorrect.
This CL also makes the relevant functions available to EDSCs and removes
duplication of the incorrect function.
PiperOrigin-RevId: 229622766
The operand and result types of binary ops are not necessarily the
same. For those binary ops, we cannot print in the short-form assembly.
Enhance impl:::printBinaryOp to consider operand and result types
to select which assembly form to use.
PiperOrigin-RevId: 229608142
A recent change in TableGen definitions allowed arbitrary AND/OR predicate
compositions at the cost of removing known-true predicate simplification.
Introduce a more advanced simplification mechanism instead.
In particular, instead of folding predicate C++ expressions directly in
TableGen, keep them as is and build a predicate tree in TableGen C++ library.
The predicate expression-substitution mechanism, necessary to implement complex
predicates for nested classes such as `ContainerType`, is replaced by a
dedicated predicate. This predicate appears in the predicate tree and can be
used for tree matching and separation. More specifically, subtrees defined
below such predicate may be subject to different transformations than those
that appear above. For example, a subtree known to be true above the
substitution predicate is not necessarily true below it.
Use the predicate tree structure to eliminate known-true and known-false
predicates before code emission, as well as to collapse AND and OR predicates
if their value can be deduced based on the value of one child.
PiperOrigin-RevId: 229605997
Start simple with single predicate match & transform rules for attributes.
* Its unclear whether modelling Attr predicates will be needed so start with allowing matching attributes with a single predicate.
* The input and output attr type often differs and so add ability to specify a transform between the input and output format.
PiperOrigin-RevId: 229580879
*) Adds support for fusing into consumer loop nests with multiple loads from the same memref.
*) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth.
*) Removes dependence on src loop depth and simplifies cost model computation.
PiperOrigin-RevId: 229575126
This is mostly plumbing to start allowing testing EDSC lowering. Prototype specifying reference implementation using verbose format without any generation/binding support. Add test pass that dumps the constructed EDSC (of which there can only be one). The idea is to enable iterating from multiple sides, this is wrong on many dimensions at the moment.
PiperOrigin-RevId: 229570535
In TableGen definitions, the "Type" class has been used for types of things
that can be stored in Attributes, but not necessarily present in the MLIR type
system. As a consequence, records like "String" or "DerviedAttrBody" were of
class "Type", which can be confusing. Furthermore, the "builderCall" field of
the "Type" class serves only for attribute construction. Some TableGen "Type"
subclasses that correspond to MLIR kinds of types do not have a canonical way
of construction only from the data available in TableGen, e.g. MemRefType would
require the list of affine maps. This leads to a conclusion that the entities
that describe types of objects appearing in Attributes should be independent of
"Type": they have some properties "Type"s don't and vice versa.
Do not parameterize Tablegen "Attr" class by an instance of "Type". Instead,
provide a "constBuilderCall" field that can be used to build an attribute from
a constant value stored in TableGen instead of indirectly going through
Attribute.Type.builderCall. Some attributes still don't have a
"constBuilderCall" because they used to depend on types without a
"builderCall".
Drop definitions of class "Type" that don't correspond to MLIR Types. Provide
infrastructure to define type-dependent attributes and string-backed attributes
for convenience.
PiperOrigin-RevId: 229570087
We also need the broadcast logic in the TensorFlow dialect. Move it to a
Dialect/ directory for a broader scope. This Dialect/ directory is intended
for code not in core IR, but can potentially be shared by multiple dialects.
Apart from fixing TensorFlow op TableGen to use this trait, this CL only
contains mechanical code shuffling.
PiperOrigin-RevId: 229563911
The constant folding rules assumes value attributes of operands are already
verified to be in good standing.
For each op in the above, the constant folding rules support both integer and
floating point cases. Broadcast behavior is also supported as per the semantics
of TFLite ops.
This CL does not handle overflow/underflow cases yet.
PiperOrigin-RevId: 229441221
LLVM IR types are defined using MLIR's extendable type system. The dialect
provides the only type kind, LLVMType, that wraps an llvm::Type*. Since LLVM
IR types are pointer-unique, MLIR type systems relies on those pointers to
perform its own type unique'ing. Type parsing and printing is delegated to
LLVM libraries.
Define MLIR operations for the LLVM IR instructions currently used by the
translation to the LLVM IR Target to simplify eventual transition. Operations
classes are defined using TableGen. LLVM IR instruction operands that are only
allowed to take constant values are accepted as attributes instead. All
operations are using verbose form for printing and parsing.
PiperOrigin-RevId: 229400375
MLIR has support for type-polymorphic instructions, i.e. instructions that may
take arguments of different types. For example, standard arithmetic operands
take scalars, vectors or tensors. In order to express such instructions in
TableGen, we need to be able to verify that a type object satisfies certain
constraints, but we don't need to construct an instance of this type. The
existing TableGen definition of Type requires both. Extract out a
TypeConstraint TableGen class to define restrictions on types. Define the Type
TableGen class as a subclass of TypeConstraint for consistency. Accept records
of the TypeConstraint class instead of the Type class as values in the
Arguments class when defining operators.
Replace the predicate logic TableGen class based on conjunctive normal form
with the predicate logic classes allowing for abitrary combinations of
predicates using Boolean operators (AND/OR/NOT). The combination is
implemented using simple string rewriting of C++ expressions and, therefore,
respects the short-circuit evaluation order. No logic simplification is
performed at the TableGen level so all expressions must be valid C++.
Maintaining CNF using TableGen only would have been complicated when one needed
to introduce top-level disjunction. It is also unclear if it could lead to a
significantly simpler emitted C++ code. In the future, we may replace inplace
predicate string combination with a tree structure that can be simplified in
TableGen's C++ driver.
Combined, these changes allow one to express traits like ArgumentsAreFloatLike
directly in TableGen instead of relying on C++ trait classes.
PiperOrigin-RevId: 229398247
This allows load, store and ForNest to be used with both Expr and Bindable.
This simplifies writing generic pieces of MLIR snippet.
For instance, a generic pointwise add can now be written:
```cpp
// Different Bindable ivs, one per loop in the loop nest.
auto ivs = makeBindables(shapeA.size());
Bindable zero, one;
// Same bindable, all equal to `zero`.
SmallVector<Bindable, 8> zeros(ivs.size(), zero);
// Same bindable, all equal to `one`.
SmallVector<Bindable, 8> ones(ivs.size(), one);
// clang-format off
Bindable A, B, C;
Stmt scalarA, scalarB, tmp;
Stmt block = edsc::Block({
ForNest(ivs, zeros, shapeA, ones, {
scalarA = load(A, ivs),
scalarB = load(B, ivs),
tmp = scalarA + scalarB,
store(tmp, C, ivs)
}),
});
// clang-format on
```
This CL also adds some extra support for pretty printing that will be used in
a future CL when we introduce standalone testing of EDSCs. At the momen twe
are lacking the basic infrastructure to write such tests.
PiperOrigin-RevId: 229375850
DenseElementAttr currently does not support value bitwidths of > 64. This can result in asan failures and crashes when trying to invoke DenseElementsAttr::writeBits/DenseElementsAttr::readBits.
PiperOrigin-RevId: 229241125
*) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality).
*) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest.
*) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests).
*) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths).
*) Test: Adds multiple unit tests to test the new functionality.
PiperOrigin-RevId: 229219757
This CL adds a short term remedy to an issue that was found during execution
tests.
Lowering of vector transfer ops uses the permutation map to determine which
ForInst have been super-vectorized. During materialization to HW vector sizes
however, some of those dimensions may be fully unrolled and do not appear in
the permutation map.
Such dimensions were then not clipped and may have accessed out of bounds.
This CL conservatively clips all dimensions to ensure no out of bounds access.
The longer term solution is still up for debate but will probably require
either passing more information between Materialization and lowering, or just
merging the 2 passes.
PiperOrigin-RevId: 228980787
Arguably the dependence of EDSCs on Analysis is not great but on the other
hand this is a strict improvement in the emitted IR and since EDSCs are an
alternative to builders it makes sense that they have as much access to
Analysis as Transforms.
PiperOrigin-RevId: 228967624
This CL is the 6th and last on the path to simplifying AffineMap composition.
This removes `AffineValueMap::forwardSubstitutions` and replaces it by simple
calls to `fullyComposeAffineMapAndOperands`.
PiperOrigin-RevId: 228962580
The const folding logic is structurally similar, so use a template
to abstract the common part.
Moved mul(x, 0) to a legalization pattern to be consistent with
mul(x, 1).
Also promoted getZeroAttr() to be a method on Builder since it is
expected to be frequently used.
PiperOrigin-RevId: 228891989
Expand type matcher template generator to consider a set of predicates that are known to
hold. This avoids inserting redundant checking for trivially true predicates
(for example predicate that hold according to the op definition). This only targets predicates that trivially holds and does not attempt any logic equivalence proof.
PiperOrigin-RevId: 228880468
Multiple binaries have the needs to open input files. Use this function
to de-duplicate the code.
Also changed openOutputFile() to return errors using std::string since
it is a library call and accessing I/O in library call is not friendly.
PiperOrigin-RevId: 228878221
This CL is the 5th on the path to simplifying AffineMap composition.
This removes the distinction between normalized single-result AffineMap and
more general composed multi-result map.
One nice byproduct of making the implementation driven by single-result is
that the multi-result extension is a trivial change: the implementation is
still single-result and we just use:
```
unsigned idx = getIndexOf(...);
map.getResult(idx);
```
This CL also fixes an AffineNormalizer implementation issue related to symbols.
Namely it stops performing substitutions on symbols in AffineNormalizer and
instead concatenates them all to be consistent with the call to
`AffineMap::compose(AffineMap)`. This latter call to `compose` cannot perform
simplifications of symbols coming from different maps based on positions only:
i.e. dims are applied and renumbered but symbols must be concatenated.
The only way to determine whether symbols from different AffineApply are the
same is to look at the concrete values. The canonicalizeMapAndOperands is thus
extended with behavior to support replacing operands that appear multiple
times.
Lastly, this CL demonstrates that the implementation is correct by rewriting
ComposeAffineMaps using only `makeComposedAffineApply`. The implementation
uses a matcher because AffineApplyOp are introduced as composed operations on
the fly instead of iteratively forwardSubstituting. For this purpose, a walker
would revisit freshly introduced AffineApplyOp. Regardless, ComposeAffineMaps
is scheduled to disappear, this CL replaces the implementation based on
iterative `forwardSubstitute` by a composed-by-construction
`makeComposedAffineApply`.
Remaining calls to `forwardSubstitute` will be removed in the next CL.
PiperOrigin-RevId: 228830443
- FM has a worst case exponential complexity. For our purposes, this worst case
is rarely expected, but could still appear due to improperly constructed
constraints (a logical/memory error in other methods for eg.) or artificially
created arbitrarily complex integer sets (adversarial / fuzz tests).
Add a check to detect such an explosion in the number of constraints and
conservatively return false from isEmpty() (instead of running out of memory
or running for too long).
- Add an artifical virus test case.
PiperOrigin-RevId: 228753496
This implements the lowering of `floordiv`, `ceildiv` and `mod` operators from
affine expressions to the arithmetic primitive operations. Integer division
rules in affine expressions explicitly require rounding towards either negative
or positive infinity unlike machine implementations that round towards zero.
In the general case, implementing `floordiv` and `ceildiv` using machine signed
division requires computing both the quotient and the remainder. When the
divisor is positive, this can be simplified by adjusting the dividend and the
quotient by one and switching signs.
In the current use cases, we are unlikely to encounter affine expressions with
negative divisors (affine divisions appear in loop transformations such as
tiling that guarantee that divisors are positive by construction). Therefore,
it is reasonable to use branch-free single-division implementation. In case of
affine maps, divisors can only be literals so we can check the sign and
implement the case for negative divisors when the need arises.
The affine lowering pass can still fail when applied to semi-affine maps
(division or modulo by a symbol).
PiperOrigin-RevId: 228668181
* Get a specific successor operand.
* Iterator support for non successor operands.
* Fix bug when removing the last operand from the operand list of an Instruction.
* Get the argument number for a BlockArgument.
PiperOrigin-RevId: 228660898
- the double buffer should be indexed (iv floordiv step) % 2 and NOT (iv % 2);
step wasn't being accounted for.
- fix test cases, enable failing test cases
PiperOrigin-RevId: 228635726
This CL added a tblgen::Attribute class to wrap around raw TableGen
Record getValue*() calls on Attr defs, which will provide a nicer
API for handling TableGen Record.
PiperOrigin-RevId: 228581107
Originally, terminators were special kinds of operation and could not be
extended by dialects. Only builtin terminators were supported and they had
custom parsers and printers. Currently, "terminator" is a property of an
operation, making it possible for dialects to define custom terminators.
However, verbose forms of operation syntax were not designed to support
terminators that may have a list of successors (each successor contains a block
name and an optional operand list). Calling printDefaultOp on a terminator
drops all successor information. Dialects are thus required to provide custom
parsers and printers for their terminators.
Introduce the syntax for the list of successors in the verbose from of the
operation. Add support for printing and parsing verbose operations with
successors.
Note that this does not yet add support for unregistered terminators since
"terminator" is a property stored in AsbtractOperation and therefore is only
available for registered operations that have an instance of AbstractOperation.
Add tests for verbose parsing. It is currently impossible to test round-trip
for verbose terminators because none of the known dialects use verbose syntax
for printing terminators by default, however the printer was exercised on the
LLVM IR dialect prototype.
PiperOrigin-RevId: 228566453
- fix visitDivExpr: constraints constructed for localVarCst used the original
divisor instead of the simplified divisor; fix this. Add a simple test case
in memref-bound-check that reproduces this bug - although this was encountered in the
context of slicing for fusion.
- improve mod expr flattening: when flattening mod expressions,
cancel out the GCD of the numerator and denominator so that we can get a
simpler flattened form along with a simpler floordiv local var for it
PiperOrigin-RevId: 228539928
Supervectorization does not plan on handling multi-result AffineMaps and
non-canonical chains of > 1 AffineApplyOp.
This CL uses the simpler single-result unbounded AffineApplyOp in the
MaterializeVectors pass.
PiperOrigin-RevId: 228469085
This CL added a tblgen::Type class to wrap around raw TableGen
Record getValue*() calls on Type defs, which will provide a
nicer API for handling TableGen Record.
The PredCNF class is also updated to work together with
tblgen::Type.
PiperOrigin-RevId: 228429090
clients. Let's re-add it in the future if there is ever a reason to. NFC.
Unrelatedly, add a use of a variable to unbreak the non-assert build.
PiperOrigin-RevId: 228284026
This CL is the 4th on the path to simplifying AffineMap composition.
This CL extract canonicalizeMapAndOperands so it can be reused by other
functions; in particular, this will be used in
`makeNormalizedAffineApply`.
PiperOrigin-RevId: 228277890
This CL is the 3rd on the path to simplifying AffineMap composition.
This CL just moves `makeNormalizedAffineApply` from VectorAnalysis to
AffineAnalysis where it more naturally belongs.
PiperOrigin-RevId: 228277182
This CL is the 2nd on the path to simplifying AffineMap composition.
This CL uses the now accepted `AffineExpr::compose(AffineMap)` to
implement `AffineMap::compose(AffineMap)`.
Implications of keeping the simplification function in
Analysis are documented where relevant.
PiperOrigin-RevId: 228276646
Alias identifiers can be used in the place of the types that they alias, and are defined as:
type-alias-def ::= '!' alias-name '=' 'type' type
type-alias ::= '!' alias-name
Example:
!avx.m128 = type vector<4 x f32>
...
"foo"(%x) : vector<4 x f32> -> ()
// becomes:
"foo"(%x) : !avx.m128 -> ()
PiperOrigin-RevId: 228271372
This CL is the 1st on the path to simplifying AffineMap composition.
This CL uses the now accepted AffineExpr.replaceDimsAndSymbols to
implement `AffineExpr::compose(AffineMap)`.
Arguably, `simplifyAffineExpr` should be part of IR and not Analysis but
this CL does not yet pull the trigger on that.
PiperOrigin-RevId: 228265845
- refactor toAffineFromEq and the code surrounding it; refactor code into
FlatAffineConstraints::getSliceBounds
- add FlatAffineConstraints methods to detect identifiers as mod's and div's of other
identifiers
- add FlatAffineConstraints::getConstantLower/UpperBound
- Address b/122118218 (don't assert on invalid fusion depths cmdline flags -
instead, don't do anything; change cmdline flags
src-loop-depth -> fusion-src-loop-depth
- AffineExpr/Map print method update: don't fail on null instances (since we have
a wrapper around a pointer, it's avoidable); rationale: dump/print methods should
never fail if possible.
- Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to
IsRangeOneToOne when it's trivially going to be true.
- Add additional test cases to exercise the new support
- update a few existing test cases since the maps are now generated uniformly with
all destination loop operands appearing for the backward slice
- Fix projectOut - fix wrong range for getBestElimCandidate.
- Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since
we didn't have any non-hyperrectangular ones.
PiperOrigin-RevId: 228265152
- Detect 'mod' to replace the combination of floordiv, mul, and subtract when
possible at construction time; when 'c' is a power of two, this reduces the number of
operations; also more compact and readable. Update simplifyAdd for this.
On a side note:
- with the affine expr flattening we have, a mod expression like d0 mod c
would be flattened into d0 - c * q, c * q <= d0 <= c*q + c - 1, with 'q'
being added as the local variable (q = d0 floordiv c); as a result, a mod
was turned into a floordiv whenever the expression was reconstructed back,
i.e., as d0 - c * (d0 floordiv c); as a result of this change, we recover
the mod back.
- rename SimplifyAffineExpr -> SimplifyAffineStructures (pass had been renamed but
the file hadn't been).
PiperOrigin-RevId: 228258120
- when SSAValue/MLValue existed, code at several places was forced to create additional
aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get
rid of such redundant code
- use filling ctors instead of explicit loops
- for smallvectors, change insert(list.end(), ...) -> append(...
- improve comments at various places
- turn getMemRefAccess into MemRefAccess ctor and drop duplicated
getMemRefAccess. In the next CL, provide getAccess() accessors for load,
store, DMA op's to return a MemRefAccess.
PiperOrigin-RevId: 228243638
Use "native" vs "derived" to differentiate attributes on ops: native ones
are specified when creating the op as a part of defining the op, while
derived ones are computed from properties of the op.
PiperOrigin-RevId: 228186962
Bind attributes similar to operands. Use to rewrite leakyreulo and const rewrite pattern. The attribute type/attributes are not currently checked so should only be used where the attributes match due to the construction of the op.
To support current attribute namespacing, convert __ in attribute name to "$" for matching purposes ('$' is not valid character in variable in TableGen).
Some simplification to make it simpler to specify indented ostream and avoid so many spaces. The goal is not to have perfectly formatted code generated but good enough so that its still easy to read for a user.
PiperOrigin-RevId: 228183639
The `for` instruction defines the loop induction variable it uses. In the
well-formed IR, the induction variable can only be used by the body of the
`for` loop. Existing implementation was explicitly cleaning the body of the
for loop to remove all uses of the induction variable before removing its
definition. However, in ill-formed IR that may appear in some stages of
parsing, there may be (invalid) users of the loop induction variable outside
the loop body. In case of unsuccessful parsing, destructor of the
ForInst-defined Value would assert because there are remaining though invalid
users of this Value. Explicitly drop all uses of the loop induction Value when
destroying a ForInst. It is no longer necessary to explicitly clean the body
of the loop, destructor of the block will take care of this.
PiperOrigin-RevId: 228168880
When destroying a FunctionParser in case of parsing failure, we clean up all
uses of undefined forward-declared references. This has been implemented as
iteration over the list of uses. However, deleting one use from the list
invalidates the iterator (`IROperand::drop` sets `nextUse` to `nullptr` while
the iterator reads `nextUse` to advance; therefore only the first use was
deleted from the list). Get a new iterator before calling drop to avoid
invalidation.
PiperOrigin-RevId: 228168849
getAffineBinaryOpExpr for consistency (NFC)
- this is consistent with the name of the class and getAffineDimExpr/ConstantExpr, etc.
PiperOrigin-RevId: 228164959
Integer comparisons can be constant folded if both of their arguments are known
constants, which we can compare in the compiler. This requires implementing
all comparison predicates, but thanks to consistency between LLVM and MLIR
comparison predicates, we have a one-to-one correspondence between predicates
and llvm::APInt comparison functions. Constant folding of comparsions with
maximum/minimum values of the integer type are left for future work.
This will be used to test the lowering of mod/floordiv/ceildiv in affine
expressions at compile time.
PiperOrigin-RevId: 228077580
These operations trivially map to LLVM IR counterparts for operands of scalar
and (one-dimensional) vector type. Multi-dimensional vector and tensor type
operands would fail type conversion before the operation conversion takes
place. Add tests for scalar and vector cases. Also add a test for vector
`select` instruction for consistency with other tests.
PiperOrigin-RevId: 228077564
This adds signed/unsigned integer division and remainder operations to the
StandardOps dialect. Two versions are required because MLIR integers are
signless, but the meaning of the leading bit is important in division and
affects the results. LLVM IR made a similar choice. Define the operations in
the tablegen file and add simple constant folding hooks in the C++
implementation. Handle signed division overflow and division by zero errors in
constant folding. Canonicalization is left for future work.
These operations are necessary to lower affine_apply's down to LLVM IR.
PiperOrigin-RevId: 228077549
Expand type to include matcher predicates. Use CNF form to allow specifying combinations of constraints for type. The matching call for the type is used to verify the construction of the operation as well as in rewrite pattern generation.
The matching initially includes redundant checks (e.g., even if the operand of the op is guaranteed to satisfy some requirement, it is still checked during matcher generation for now). As well as some of the traits specified now check what the generated code already checks. Some of the traits can be removed in future as the verify method will include the relevant checks based on the op definition already.
More work is needed for variadic operands.
CNF form is used so that in the follow up redundant checks in the rewrite patterns could be omitted (e.g., when matching a F32Tensor, one does not need to verify that op X's operand 0 is a Tensor if that is guaranteed by op X's definition). The alternative was to have single matcher function specified, but this would not allow for reasoning about what attributes already hold (at the level of PredAtoms).
Use this new operand type restrictions to rewrite BiasAdd with floating point operands as declarative pattern.
PiperOrigin-RevId: 227991412
- this is CL 1/2 that does a clean up and gets rid of one limitation in an
underlying method - as a result, fusion works for more cases.
- fix bugs/incomplete impl. in toAffineMapFromEq
- fusing across rank changing reshapes for example now just works
For eg. given a rank 1 memref to rank 2 memref reshape (64 -> 8 x 8) like this,
-loop-fusion -memref-dataflow-opt now completely fuses and inlines/store-forward
to get rid of the temporary:
INPUT
// Rank 1 -> Rank 2 reshape
for %i0 = 0 to 64 {
%v = load %A[%i0]
store %v, %B[%i0 floordiv 8, i0 mod 8]
}
for %i1 = 0 to 8
for %i2 = 0 to 8
%w = load %B[%i1, i2]
"foo"(%w) : (f32) -> ()
OUTPUT
$ mlir-opt -loop-fusion -memref-dataflow-opt fuse_reshape.mlir
#map0 = (d0, d1) -> (d0 * 8 + d1)
mlfunc @fuse_reshape(%arg0: memref<64xf32>) {
for %i0 = 0 to 8 {
for %i1 = 0 to 8 {
%0 = affine_apply #map0(%i0, %i1)
%1 = load %arg0[%0] : memref<64xf32>
"foo"(%1) : (f32) -> ()
}
}
}
AFAIK, there is no polyhedral tool / compiler that can perform such fusion -
because it's not really standard loop fusion, but possible through a
generalized slicing-based approach such as ours.
PiperOrigin-RevId: 227918338
Supervectorization does not plan on handling multi-result AffineMaps and
non-canonical chains of > 1 AffineApplyOp.
This CL introduces a simpler abstraction and composition of single-result
unbounded AffineApplyOp by using the existing unbound AffineMap composition.
This CL adds a simple API call and relevant tests:
```c++
OpPointer<AffineApplyOp> makeNormalizedAffineApply(
FuncBuilder *b, Location loc, AffineMap map, ArrayRef<Value*> operands);
```
which creates a single-result unbounded AffineApplyOp.
The operands of AffineApplyOp are not themselves results of AffineApplyOp by
consrtuction.
This represent the simplest possible interface to complement the composition
of (mathematical) AffineMap, for the cases when we are interested in applying
it to Value*.
In this CL the composed AffineMap is not compressed (i.e. there exist operands
that are not part of the result). A followup commit will compress to normal
form.
The single-result unbounded AffineApplyOp abstraction will be used in a
followup CL to support the MaterializeVectors pass.
PiperOrigin-RevId: 227879021
This impl class currently provides the following:
* auto definition of the 'ImplType = StorageClass'
* get/getChecked wrappers around TypeUniquer
* 'verifyConstructionInvariants' hook
- This hook verifies that the arguments passed into get/getChecked are valid
to construct a type instance with.
With this, all non-generic type uniquing has been moved out of MLIRContext.cpp
PiperOrigin-RevId: 227871108
symbols.
Included with this is some other infra:
- Testcases for other canonicalizations that I will implement next.
- Some helpers in AffineMap/Expr for doing simple walks without defining whole
visitor classes.
- A 'replaceDimsAndSymbols' facility that I'll be using to simplify maps and
exprs, e.g. to fold one constant into a mapping and to drop/renumber unused dims.
- Allow index (and everything else) to work in memref's, as we previously
discussed, to make the testcase easier to write.
- A "getAffineBinaryExpr" helper to produce a binop when you know the kind as
an enum.
This line of work will eventually subsume the ComposeAffineApply pass, but it is no where close to that yet :-)
PiperOrigin-RevId: 227852951
Use tablegen to generate definitions of the standard binary arithmetic
operations. These operations share a lot of boilerplate that is better off
generated by a tool.
Using tablegen for standard binary arithmetic operations requires the following
modifications.
1. Add a bit field `hasConstantFolder` to the base Op tablegen class; generate
the `constantFold` method signature if the bit is set. Differentiate between
single-result and zero/multi-result functions that use different signatures.
The implementation of the method remains in C++, similarly to canonicalization
patterns, since it may be large and non-trivial.
2. Define the `AnyType` record of class `Type` since `BinaryOp` currently
provided in op_base.td is supposed to operate on tensors and other tablegen
users may rely on this behavior.
Note that this drops the inline documentation on the operation classes that was
copy-pasted around anyway. Since we don't generate g3doc from tablegen yet,
keep LangRef.md as it is. Eventually, the user documentation can move to the
tablegen definition file as well.
PiperOrigin-RevId: 227820815
Even though it is unexpected except in pathological cases, a nullptr clone may
be returned. This CL handles the nullptr return gracefuly.
PiperOrigin-RevId: 227764615
The strict requirement (i.e. at least 2 HW vectors in a super-vector) was a
premature optimization to avoid interfering with other vector code potentially
introduced via other means.
This CL avoids this premature optimization and the spurious errors it causes
when super-vector size == HW vector size (which is a possible corner case).
This may be revisited in the future.
PiperOrigin-RevId: 227763966
This corner was found when stress testing with a functional end-to-end CPU
path. In the case where the hardware vector size is 1x...x1 the `keep` vector
is empty and would result a crash.
While there is no reason to expect a 1x...x1 HW vector in practice, this case
can just gracefully degrade to scalar, which is what this CL allows.
PiperOrigin-RevId: 227761097
* Match using isa
- This limits the rewrite pattern to ops defined in op registry but that is probably better end state (esp. for additional verification).
PiperOrigin-RevId: 227598946
Dialect specific types are registered similarly to operations, i.e. registerType<...> within the dialect. Unlike operations, there is no notion of a "verbose" type, that is *all* types must be registered to a dialect. Casting support(isa/dyn_cast/etc.) is implemented by reserving a range of type kinds in the top level Type class as opposed to string comparison like operations.
To support derived types a few hooks need to be implemented:
In the concrete type class:
- static char typeID;
* A unique identifier for the type used during registration.
In the Dialect:
- typeParseHook and typePrintHook must be implemented to provide parser support.
The syntax for dialect extended types is as follows:
dialect-type: '!' dialect-namespace '<' '"' type-specific-data '"' '>'
The 'type-specific-data' is information used to identify different types within the dialect, e.g:
- !tf<"variant"> // Tensor Flow Variant Type
- !tf<"string"> // Tensor Flow String Type
TensorFlow/TensorFlowControl types are now implemented as dialect specific types as a proof
of concept.
PiperOrigin-RevId: 227580052
This change is mechanical and merges the LowerAffineApplyPass and
LowerIfAndForPass into a single LowerAffinePass. It makes a step towards
defining an "affine dialect" that would contain all polyhedral-related
constructs. The motivation for merging these two passes is based on retiring
MLFunctions and, eventually, transforming If and For statements into regular
operations. After that happens, LowerAffinePass becomes yet another
legalization.
PiperOrigin-RevId: 227566113
Existing implementation was created before ML/CFG unification refactoring and
did not concern itself with further lowering to separate concerns. As a
result, it emitted `affine_apply` instructions to implement `for` loop bounds
and `if` conditions and required a follow-up function pass to lower those
`affine_apply` to arithmetic primitives. In the unified function world,
LowerForAndIf is mostly a lowering pass with low complexity. As we move
towards a dialect for affine operations (including `for` and `if`), it makes
sense to lower `for` and `if` conditions directly to arithmetic primitives
instead of relying on `affine_apply`.
Expose `expandAffineExpr` function in LoweringUtils. Use this function
together with `expandAffineMaps` to emit primitives that implement loop and
branch conditions directly.
Also remove tests that become unnecessary after transforming LowerForAndIf into
a function pass.
PiperOrigin-RevId: 227563608
In LoweringUtils, extract out `expandAffineMap`. This function takes an affine
map and a list of values the map should be applied to and emits a sequence of
arithmetic instructions that implement the affine map. It is independent of
the AffineApplyOp and can be used in places where we need to insert an
evaluation of an affine map without relying on a (temporary) `affine_apply`
instruction. This prepares for a merge between LowerAffineApply and
LowerForAndIf passes.
Move the `expandAffineApply` function to the LowerAffineApply pass since it is
the only place that must be aware of the `affine_apply` instructions.
PiperOrigin-RevId: 227563439
The entire compiler now looks at structural properties of the function (e.g.
does it have one block, does it contain an if/for stmt, etc) so the only thing
holding up this difference is round tripping through the parser/printer syntax.
Removing this shrinks the compile by ~140LOC.
This is step 31/n towards merging instructions and statements. The last step
is updating the docs, which I will do as a separate patch in order to split it
from this mostly mechanical patch.
PiperOrigin-RevId: 227540453
Moving forward dialect namespaces cannot contain '.' characters.
This cl also standardizes that operation names must begin with the dialect namespace followed by a '.'.
PiperOrigin-RevId: 227532193
This commit adds support for the "select" operation that lowers directly into
its LLVM IR counterpart. A simple test is included.
PiperOrigin-RevId: 227527893
runOnCFG/MLFunction override locations. Passes that care can handle this
filtering if they choose. Also, eliminate one needless difference between
CFG/ML functions in the parser.
This is step 30/n towards merging instructions and statements.
PiperOrigin-RevId: 227515912
This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs)
in MLIR components:
1. a `Type` system of shell classes that closely matches the MLIR type system. These
types are subdivided into `Bindable` leaf expressions and non-bindable `Expr`
expressions;
2. an `MLIREmitter` class whose purpose is to:
a. maintain a map of `Bindable` leaf expressions to concrete SSAValue*;
b. provide helper functionality to specify bindings of `Bindable` classes to
SSAValue* while verifying comformable types;
c. traverse the `Expr` and emit the MLIR.
This is used on a concrete example to implement MemRef load/store with clipping in the
LowerVectorTransfer pass. More specifically, the following pseudo-C++ code:
```c++
MLFuncBuilder *b = ...;
Location location = ...;
Bindable zero, one, expr, size;
// EDSL expression
auto access = select(expr < zero, zero, select(expr < size, expr, size - one));
auto ssaValue = MLIREmitter(b)
.bind(zero, ...)
.bind(one, ...)
.bind(expr, ...)
.bind(size, ...)
.emit(location, access);
```
is used to emit all the MLIR for a clipped MemRef access.
This simple EDSL can easily be extended to more powerful patterns and should
serve as the counterpart to pattern matchers (and could potentially be unified
once we get enough experience).
In the future, most of this code should be TableGen'd but for now it has
concrete valuable uses: make MLIR programmable in a declarative fashion.
This CL also adds Stmt, proper supporting free functions and rewrites
VectorTransferLowering fully using EDSCs.
The code for creating the EDSCs emitting a VectorTransferReadOp as loops
with clipped loads is:
```c++
Stmt block = Block({
tmpAlloc = alloc(tmpMemRefType),
vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
ForNest(ivs, lbs, ubs, steps, {
scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs),
store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs),
}),
vectorValue = load(vectorView, zero),
tmpDealloc = dealloc(tmpAlloc.getLHS())});
emitter.emitStmt(block);
```
where `accessInfo.clippedScalarAccessExprs)` is created with:
```c++
select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one));
```
The generated MLIR resembles:
```mlir
%1 = dim %0, 0 : memref<?x?x?x?xf32>
%2 = dim %0, 1 : memref<?x?x?x?xf32>
%3 = dim %0, 2 : memref<?x?x?x?xf32>
%4 = dim %0, 3 : memref<?x?x?x?xf32>
%5 = alloc() : memref<5x4x3xf32>
%6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
for %i4 = 0 to 3 {
for %i5 = 0 to 4 {
for %i6 = 0 to 5 {
%7 = affine_apply #map0(%i0, %i4)
%8 = cmpi "slt", %7, %c0 : index
%9 = affine_apply #map0(%i0, %i4)
%10 = cmpi "slt", %9, %1 : index
%11 = affine_apply #map0(%i0, %i4)
%12 = affine_apply #map1(%1, %c1)
%13 = select %10, %11, %12 : index
%14 = select %8, %c0, %13 : index
%15 = affine_apply #map0(%i3, %i6)
%16 = cmpi "slt", %15, %c0 : index
%17 = affine_apply #map0(%i3, %i6)
%18 = cmpi "slt", %17, %4 : index
%19 = affine_apply #map0(%i3, %i6)
%20 = affine_apply #map1(%4, %c1)
%21 = select %18, %19, %20 : index
%22 = select %16, %c0, %21 : index
%23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32>
store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32>
}
}
}
%24 = load %6[%c0] : memref<1xvector<5x4x3xf32>>
dealloc %5 : memref<5x4x3xf32>
```
In particular notice that only 3 out of the 4-d accesses are clipped: this
corresponds indeed to the number of dimensions in the super-vector.
This CL also addresses the cleanups resulting from the review of the prevous
CL and performs some refactoring to simplify the abstraction.
PiperOrigin-RevId: 227367414
on this to merge together the classes, but there may be other simplification
possible. I'll leave that to riverriddle@ as future work.
This is step 29/n towards merging instructions and statements.
PiperOrigin-RevId: 227328680
simplifying them in minor ways. The only significant cleanup here
is the constant folding pass. All the other changes are simple and easy,
but this is still enough to shrink the compiler by 45LOC.
The one pass left to merge is the CSE pass, which will be move involved, so I'm
splitting it out to its own patch (which I'll tackle right after this).
This is step 28/n towards merging instructions and statements.
PiperOrigin-RevId: 227328115
Remove an unnecessary restriction in forward substitution. Slightly
simplify LLVM IR lowering, which previously would crash if given an ML
function, it should now produce a clean error if given a function with an
if/for instruction in it, just like it does any other unsupported op.
This is step 27/n towards merging instructions and statements.
PiperOrigin-RevId: 227324542
representation, shrinking by 70LOC. The PatternRewriter class can probably
also be simplified as well, but one step at a time.
This is step 26/n towards merging instructions and statements. NFC.
PiperOrigin-RevId: 227324218
- drop these ununsed/incomplete sketches given the new design
@albertcohen is working on, and given that FlatAffineConstraints is now
stable and fast enough for all the analyses/transforms that depend on it.
PiperOrigin-RevId: 227322739
- introduce PostDominanceInfo in the right/complete way and use that for post
dominance check in store-load forwarding
- replace all uses of Analysis/Utils::dominates/properlyDominates with
DominanceInfo::dominates/properlyDominates
- drop all redundant copies of dominance methods in Analysis/Utils/
- in pipeline-data-transfer, replace dominates call with a much less expensive
check; similarly, substitute dominates() in checkMemRefAccessDependence with
a simpler check suitable for that context
- fix a bug in properlyDominates
- improve doc for 'for' instruction 'body'
PiperOrigin-RevId: 227320507
- dominates() for blocks was assuming that there was only a single block at the
top level whenever there was a hierarchy of blocks (as in the case of 'for'/'if'
instructions).
- fix the comments as well
PiperOrigin-RevId: 227319738
function pass, and eliminating the need to copy over code and do
interprocedural updates. While here, also improve it to make fewer empty
blocks, and rename it to "LowerIfAndFor" since that is what it does. This is
a net reduction of ~170 lines of code.
As drive-bys, change the splitBlock method to *not* insert an unconditional
branch, since that behavior is annoying for all clients. Also improve the
AsmPrinter to not crash when a block is referenced that isn't linked into a
function.
PiperOrigin-RevId: 227308856
PrintOpStatsPass is maintaining state (op stats ) across functions and doing
per-module work - it should be a module pass.
PiperOrigin-RevId: 227294151
- the load/store forwarding relies on memref dependence routines as well as
SSA/dominance to identify the memref store instance uniquely supplying a value
to a memref load, and replaces the result of that load with the value being
stored. The memref is also deleted when possible if only stores remain.
- add methods for post dominance for MLFunction blocks.
- remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth,
getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were
earlier static)
- add a helper method in FlatAffineConstraints - isRangeOneToOne.
PiperOrigin-RevId: 227252907
better order.
- update isEmpty() to eliminate IDs in a better order. Speed improvement for
complex cases (for eg. high-d reshape's involving mod's/div's).
- minor efficiency update to projectOut (was earlier making an extra albeit
benign call to gaussianEliminateIds) (NFC).
- move getBestIdToEliminate further up in the file (NFC).
- add the failing test case.
- add debug info to checkMemRefAccessDependence.
PiperOrigin-RevId: 227244634
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops. Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.
This is step 25/n towards merging instructions and statements.
PiperOrigin-RevId: 227243966
printing the entry block in a CFG function's argument line. Since I'm touching
all of the testcases anyway, change the argument list from printing as
"%arg : type" to "%arg: type" which is more consistent with bb arguments.
In addition to being more consistent, this is a much nicer look for cfg functions.
PiperOrigin-RevId: 227240069
have a designator. This improves diagnostics and merges handling between CFG
and ML functions more. This also eliminates hard coded parser knowledge of
terminator keywords, allowing dialects to define their own terminators.
PiperOrigin-RevId: 227239398
requires enhancing DominanceInfo to handle the structure of an ML function,
which is required anyway. Along the way, this also fixes a const correctness
problem with Instruction::getBlock().
This is step 24/n towards merging instructions and statements.
PiperOrigin-RevId: 227228900
the function signature, giving them common functionality to ml functions. This
is a strictly additive patch that adds new capability without changing behavior
in a significant way (other than a few diagnostic cleanups). A subsequent
patch will change the printer to use this behavior, which will require updating
a ton of testcases. :)
This exposes the fact that we need to make a grammar change for block
arguments, as is tracked by b/122119779
This is step 23/n towards merging instructions and statements, and one of the
first steps towards eliminating the "cfg vs ml" distinction at a syntax and
semantic level.
PiperOrigin-RevId: 227228342
by ~80 lines. This causes a slight change to diagnostics, but
is otherwise behavior preserving.
This is step 22/n towards merging instructions and statements, MFC.
PiperOrigin-RevId: 227187857
consistent and moving the using declarations over. Hopefully this is the last
truly massive patch in this refactoring.
This is step 21/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227178245
- extend/complete dependence tester to utilize local var info while adding
access function equality constraints; one more step closer to get slicing
based fusion working in the general case of affine_apply's involving mod's/div's.
- update test case to reflect more accurate dependence information; remove
inaccurate comment on test case mod_deps.
- fix a minor "bug" in equality addition in addMemRefAccessConstraints (doesn't
affect correctness, but the fixed version is more intuitive).
- some more surrounding code clean up
- move simplifyAffineExpr out of anonymous AffineExprFlattener class - the
latter has state, and the former should reside outside.
PiperOrigin-RevId: 227175600
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.
This is step 19/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227163082
Add convenience wrapper to make it easier to iterate over attributes and operands of operator defined in TableGen file. Use this class in RewriterGen (not used in the op generator yet, will do shortly). Change the RewriterGen to pass the bound arguments explicitly, this is in preparation for multi-op matching.
PiperOrigin-RevId: 227156748
by about 100 LOC), without changing any existing behavior.
This is step 20/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227155000
StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names.
This is step 17/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227121537
OperationInst derives from it. This allows eliminating some forwarding
functions, other complex code handling multiple paths, and the 'isStatement'
bit tracked by Operation.
This is the last patch I think I can make before the big mechanical change
merging Operation into OperationInst, coming next.
This is step 15/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227077411
Sometimes we have to get the raw value of the FloatAttr to invoke APIs from
non-MLIR libraries (i.e. in the tpu_ops.inc and convert_tensor.cc files). Using
`FloatAttr::getValue().convertToFloat()` and
`FloatAttr::getValue().convertToDouble()` is not safe because interally they
checke the semantics of the APFloat in the attribute, and the semantics is not
always specified (the default value is f64 then convertToFloat will fail) or
inferred incorrectly (for example, using 1.0 instead of 1.f for IEEEFloat).
Calling these convert methods without knowing the semantics can usually crash
the compiler.
This new method converts the value of a FloatAttr to double even if it loses
precision. Currently this method can be used to read in f32 data from arrays.
PiperOrigin-RevId: 227076616
#includes so Statements.h includes Operation.h but nothing else does. This is
in preparation to eliminate the Operation class and the complexity it brings
with it. I split this patch off because it is just moving stuff around, the
next patch will be more complex.
This is step 14/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227071777
StmtSuccessorIterator/StmtSuccessorIterator, and rename and move the
CFGFunctionViewGraph pass to ViewFunctionGraph.
This is step 13/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227069438
FuncBuilder class. Also rename SSAValue.cpp to Value.cpp
This is step 12/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227067644
is the new base of the SSA value hierarchy. This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.
This is step 11/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227064624
This *only* changes the internal data structures, it does not affect the user visible syntax or structure of MLIR code. Function gets new "isCFG()" sorts of predicates as a transitional measure.
This patch is gross in a number of ways, largely in an effort to reduce the amount of mechanical churn in one go. It introduces a bunch of using decls to keep the old names alive for now, and a bunch of stuff needs to be renamed.
This is step 10/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227044402
Existing implementation of isContiguousAccess asserts that one of the
function arguments is within certain range, depending on another parameter.
However, the value of this argument may come from outside, in particular in the
loop vectorization pass it may come from command line arguments. This leads
to 'mlir-opt' crashing on an assertion depending on flags. Handle the error
gracefully by reporting error returning a negative result instead. This
negative result prevents any further transformation by the vectorizer so the IR
remains valid.
PiperOrigin-RevId: 227029496
Move PrintOpStatsPass out of tools and to other passes (moved to Analysis as it
doesn't modify the program but it is different than the other analysis passes
as it is only consumer at present is the user).
PiperOrigin-RevId: 227018996
making it more similar to the CFG side of things. It is true that in a deeply
nested case that this is not a guaranteed O(1) time operation, and that 'get'
could lead compiler hackers to think this is cheap, but we need to merge these
and we can look into solutions for this in the future if it becomes a problem
in practice.
This is step 9/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226983931
For performance/memory saving purpose, having the Instruction holding a
std::vector for the operands isn't a really good tradeoff. The only reason for
this was to support adding/removing easily BasicBlock arguments to Terminator.
Since this isn't the most common operation, we instead force a pre-allocated
list of operands on Instructions at creation time.
PiperOrigin-RevId: 226981227
graph specializations for doing CFG traversals of ML Functions, making the two
sorts of functions have the same capabilities.
This is step 8/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226968502
BlockArgument arguments of the entry block instead. This makes MLFunctions and
CFGFunctions work more similarly.
This is step 7/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226966975
MLFunction, IfStmt, ForStmt even though they currently only contain exactly one
block in that list.
This is step 6/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226960278
The NameLoc can be used to represent a variable, node or method. The
CallSiteLoc has two fields, one represents the concrete location and another
one represents the caller's location. Multiple CallSiteLocs can be chained as
a call stack.
For example, the following call stack
```
AAA
at file1:1
at file2:135
at file3:34
```
can be formed by call0:
```
auto name = NameLoc::get("AAA");
auto file1 = FileLineColLoc::get("file1", 1);
auto file2 = FileLineColLoc::get("file2", 135);
auto file3 = FileLineColLoc::get("file3", 34);
auto call2 = CallSiteLoc::get(file2, file3);
auto call1 = CallSiteLoc::get(file1, call2);
auto call0 = CallSiteLoc::get(name, call1);
```
PiperOrigin-RevId: 226941797
Supervectorization uses null pointers to SSA values as a means of communicating
the failure to vectorize. In operation vectorization, all operations producing
the values of operation arguments must be vectorized for the given operation to
be vectorized. The existing check verified if any of the value "def"
statements was vectorized instead, sometimes leading to assertions inside `isa`
called on a null pointer. Fix this to check that all "def" statements were
vectorized.
PiperOrigin-RevId: 226941552
The binary subtraction operations were not supported by the lowering because
they were not essential for the testing flow. Add support for these
operations.
PiperOrigin-RevId: 226941463
from it. This is necessary progress to squaring away the parent relationship
that a StmtBlock has with its enclosing if/for/fn, and makes room for functions
to have more than one block in the future. This also removes IfClause and ForStmtBody.
This is step 5/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226936541
for SSA values in terminators, but easily worked around. At the same time,
move the StmtOperand list in a OperationStmt to the end of its trailing
objects list so we can *reduce* the number of operands, without affecting
offsets to the other stuff in the allocation.
This is important because we want OperationStmts to be consequtive, including
their operands - we don't want to use an std::vector of operands like
Instructions have.
This is patch 4/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226865727
clients to use OperationState instead. This makes MLFuncBuilder more similiar
to CFGFuncBuilder. This whole area will get tidied up more when cfg and ml
worlds get unified. This patch is just gardening, NFC.
PiperOrigin-RevId: 226701959