- hoist DMAs past all loops immediately surrounding the region that the latter
is invariant on - do this at DMA generation time itself
PiperOrigin-RevId: 234628447
Originally, edsc::Expr had a long enum edsc::ExprKind with all supported types
of operations. Recent Expr extensibility support removed the need to specify
supported types in advance. Replace the no-longer-used blocks of enum values
reserved for unary/binary/ternary/variadic expressions with simple values (it
is still useful to know if an expression is, e.g., binary to access it through
a simpler API).
Furthermore, wrap string-comparison now used to identify specific ops into an
`Expr::is_op<>` function template, that acts similarly to `Instruction::isa<>`.
Introduce `{Unary,Binary,Ternary,Variadic}Expr::make<> ` function template that
creates a Expression emitting the MLIR Op specified as template argument.
PiperOrigin-RevId: 234612916
Expose the result types of edsc::Expr, which are now stored for all types of
Exprs and not only for the variadic ones. Require return types when an Expr is
constructed, if it will ever have some. An empty return type list is
interpreted as an Expr that does not create a value (e.g. `return` or `store`).
Conceptually, all edss::Exprs are now typed, with the type being a (potentially
empty) tuple of return types. Unbound expressions and Bindables must now be
constructed with a specific type they will take. This makes EDSC less
evidently type-polymorphic, but we can still write generic code such as
Expr sumOfSquares(Expr lhs, Expr rhs) { return lhs * lhs + rhs * rhs; }
and use it to construct different typed expressions as
sumOfSquares(Bindable(IndexType::get(ctx)), Bindable(IndexType::get(ctx)));
sumOfSquares(Bindable(FloatType::getF32(ctx)),
Bindable(FloatType::getF32(ctx)));
On the positive side, we get the following.
1. We can now perform type checking when constructing Exprs rather than during
MLIR emission. Nevertheless, this is still duplicates the Op::verify()
until we can factor out type checking from that.
2. MLIREmitter is significantly simplified.
3. ExprKind enum is only used for actual kinds of expressions. Data structures
are converging with AbstractOperation, and the users can now create a
VariadicExpr("canonical_op_name", {types}, {exprs}) for any operation, even
an unregistered one without having to extend the enum and make pervasive
changes to EDSCs.
On the negative side, we get the following.
1. Typed bindables are more verbose, even in Python.
2. We lose the ability to do print debugging for higher-level EDSC abstractions
that are implemented as multiple MLIR Ops, for example logical disjunction.
This is the step 2/n towards making EDSC extensible.
***
Move MLIR Op construction from MLIREmitter::emitExpr to Expr::build since Expr
now has sufficient information to build itself.
This is the step 3/n towards making EDSC extensible.
Both of these strive to minimize the amount of irrelevant changes. In
particular, this introduces more complex pretty-printing for affine and binary
expression to make sure tests continue to pass. It also relies on string
comparison to identify specific operations that an Expr produces.
PiperOrigin-RevId: 234609882
This CL extended TableGen Operator class to provide accessors for information on op
results.
In OpDefinitionGen, added checks to make sure only the last result can be variadic,
and adjusted traits and builders generation to consider variadic results.
PiperOrigin-RevId: 234596124
EDSC currently implement a block as a statement that is itself a list of
statements. This suffers from two modeling problems: (1) these blocks are not
addressable, i.e. one cannot create an instruction where thus constructed block
is a successor; (2) they support block nesting, which is not supported by MLIR
blocks. Furthermore, emitting such "compound statement" (misleadingly named
`Block` in Python bindings) does not actually produce a new Block in the IR.
Implement support for creating actual IR Blocks in EDSC. In particular, define
a new StmtBlock EDSC class that is neither an Expr nor a Stmt but contains a
list of Stmts. Additionally, StmtBlock may have (early-) typed arguments.
These arguments are Bindable expressions that can be used inside the block.
Provide two calls in the MLIREmitter, `emitBlock` that actually emits a new
block and `emitBlockBody` that only emits the instructions contained in the
block without creating a new block. In the latter case, the instructions must
not use block arguments.
Update Python bindings to make it clear when instruction emission happens
without creating a new block.
PiperOrigin-RevId: 234556474
The parameter to emitStandaloneParamBuilder() was renamed from hasResultType to
isAllSameType, which is the opposite boolean value. The logic should be changed
to make them consistent.
Also re-ordered some methods in Operator. And few other tiny improvements.
PiperOrigin-RevId: 234478316
generation pass to make it drop certain assumptions, complete TODOs.
- multiple fixes for getMemoryFootprintBytes
- pass loopDepth correctly from getMemoryFootprintBytes()
- use union while computing memory footprints
- bug fixes for addAffineForOpDomain
- take into account loop step
- add domains of other loop IVs in turn that might have been used in the bounds
- dma-generate: drop assumption of "non-unit stride loops being tile space loops
and skipping those and recursing to inner depths"; DMA generation is now purely
based on available fast mem capacity and memory footprint's calculated
- handle memory region compute failures/bailouts correctly from dma-generate
- loop tiling cleanup/NFC
- update some debug and error messages to use emitNote/emitError in
pipeline-data-transfer pass - NFC
PiperOrigin-RevId: 234245969
The existing implementation of makeFunctionType in EDSC contains a bug: the
array of input types is overwritten using output types passed as arguments and
the array of output types is never filled in. This leads to all sorts of
incorrect memory behavior. Fill in the array of output types using the proper
argument.
PiperOrigin-RevId: 234177221
A recent change introduced a possibility to run LLVM IR transformation during
JIT-compilation in the ExecutionEngine. Provide helper functions that
construct IR transformers given either clang-style optimization levels or a
list passes to run. The latter wraps the LLVM command line option parser to
parse strings rather than actual command line arguments. As a result, we can
run either of
mlir-cpu-runner -O3 input.mlir
mlir-cpu-runner -some-mlir-pass -llvm-opts="-llvm-pass -other-llvm-pass"
to combine different transformations. The transformer builder functions are
provided as a separate library that depends on LLVM pass libraries unlike the
main execution engine library. The library can be used for integrating MLIR
execution engine into external frameworks.
PiperOrigin-RevId: 234173493
*) Adds utility to LoopUtils to perform loop interchange of two AffineForOps.
*) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges.
*) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential.
*) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them.
*) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation.
*) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest.
*) Adds and updates related unit tests.
PiperOrigin-RevId: 234158370
We specify op operands and results in TableGen op definition using the same syntax.
They should be modelled similarly in TableGen driver wrapper classes.
PiperOrigin-RevId: 234153332
Add support for converting MLIR `call_indirect` instructions to the LLVM IR
dialect. In LLVM IR, the same instruction is used for direct and indirect
calls. In the dialect, we have `llvm.call` and `llvm.call0` to work around the
absence of the void type in MLIR. For direct calls, the callee is stored as
instruction attribute. Use the same pair of instructions for indirect calls by
omitting the callee attribute. In the MLIR to LLVM IR translator, check the
presence of attribute to decide whether to construct a direct or an indirect
call using different LLVM IR Builder functions.
Add support for converting constants of function type to the LLVM IR dialect
and for translating them to the LLVM IR proper. The `llvm.constant` operation
works similarly to other types: its attribute has MLIR function type but the
value it produces has LLVM IR function type wrapped in the dialect type. While
lowering, look up the pointer to the converted function in the corresponding
mapping.
PiperOrigin-RevId: 234132351
Function types are built-in in MLIR and affect the validity of the IR itself.
However, advanced target dialects such as the LLVM IR dialect may include
custom function types. Until now, dialect conversion was expecting function
types not to be converted to the custom type: although the signatures was
allowed to change, the outer type must have been an mlir::FunctionType. This
effectively prevented dialect conversion from creating instructions that
operate on values of the custom function type.
Dissociate function signature conversion from general type conversion.
Function signature conversion must still produce an mlir::FunctionType and is
used in places where built-in types are required to make IR valid. General
type conversion is used for SSA values, including function and block arguments
and function results.
Exercise this behavior in the LLVM IR dialect conversion by converting function
types to LLVM IR function pointer types. The pointer to a function is chosen
to provide consistent lowering of higher-order functions: while it is possible
to have a value of function type, it is not possible to create a function type
accepting a returning another function type.
PiperOrigin-RevId: 234124494
Update FlatAffineConstraints::getLower/UpperBounds to project to the identifier for which bounds are being computed. This change enables computing bounds on an identifier which were previously dependent on the bounds of another identifier.
PiperOrigin-RevId: 234017514
for dma-generate, loop-unroll.
- add -tile-sizes command line option for loop tiling to specify different tile
sizes for loops in a band
- clean up command line options for loop-unroll, dma-generate (remove
cl::hidden)
PiperOrigin-RevId: 234006232
If we see an add op adding a constant value to a convolution op with constant
bias, we can fuse the add into the convolution op by constant folding the
bias and the add op's constant operand.
This CL also removes dangling RewriterGen check that prevents us from using
nested DAG nodes in result patterns, which is already supported.
PiperOrigin-RevId: 233989654
For ops with the SameOperandsAndResultType trait, we know that all result types
should be the same as the first operand's type. So we can generate a build()
method without requiring result types as parameters and also invoke this method
when constructing such ops during expanding rewrite patterns.
Similarly for ops have broadcast behavior, we can define build() method to use
the deduced type as the result type. So we can also calling into this build()
method when constructing ops in RewriterGen.
PiperOrigin-RevId: 233988307
In LowerEDSCTestPass, there are two range-for loops that only do assertions on
the loop variable. With assertions disabled, the variable becomes unused and
triggers a warning promoted to error. Cast it to void in the loop to supress
the warning.
PiperOrigin-RevId: 233936171
Original implementation of the translation from MLIR to LLVM IR operated on the
Standard+BuiltIn dialect, with a later addition of the SuperVector dialect.
This required the translation to be aware of a potetially large number of other
dialects as the infrastructure extended. With the recent introduction of the
LLVM IR dialect into MLIR, the translation can be switched to only translate
the LLVM IR dialect, and the translation of the operations becomes largely
mechanical.
The reimplementation of the translator follows the lines of the original
translator in function and basic block conversion. In particular, block
arguments are converted to LLVM IR PHI nodes, which are connected to their
sources after all blocks of a function had been converted. Thanks to LLVM IR
types being wrapped in the MLIR LLVM dialect type, type conversion is
simplified to only convert function types, all other types are simply
unwrapped. Individual instructions are constructed using the LLVM IRBuilder,
which has a great potential for being table-generated from the LLVM IR dialect
operation definitions.
The input of the test/Target/llvmir.mlir is updated to use the MLIR LLVM IR
dialect. While it is now redundant with the dialect conversion test, the point
of the exercise is to guarantee exactly the same LLVM IR is emitted. (Only the
name of the allocation function is changed from `__mlir_alloc` to `alloc` in
the CHECK lines.) It will be simplified in a follow-up commit.
PiperOrigin-RevId: 233842306
EDSC expressions evolved to have different types of underlying storage.
Separate classes are used for unary, binary, ternary and variadic expressions.
The latter covers all the needs of the three special cases. Remove these
special cases and use a single ExprStorage class everywhere while maintaining
the same APIs at the Expr level (ExprStorage is an internal implementation
class).
This is step 1/n to converging EDSC expressions and Ops and making EDSCs
support custom operations.
PiperOrigin-RevId: 233704912
DimOp is converted to a constant LLVM IR dialect operation for static
dimensions and to an access to the dynamic size info stored in the memref
descriptor for the dynamic dimensions. This behavior is consistent with the
existing mlir-translator.
This completes the porting of MLIR -> LLVM lowering to the dialect conversion
infrastructure.
PiperOrigin-RevId: 233665634
- for the DMA transfers being pipelined through double buffering, generate
deallocs for the double buffers being alloc'ed
This change is along the lines of cl/233502632. We initially wanted to experiment with
scoped allocation - so the deallocation's were usually not necessary; however, they are
needed even with scoped allocations in some situations - for eg. when the enclosing loop
gets unrolled. The dealloc serves as an end of lifetime marker.
PiperOrigin-RevId: 233653463
In the current state, edsc::Expr and edsc::Stmt overload operators to construct
other Exprs and Stmts. This includes some unconventional overloads of the
`operator==` to create a comparison expression and of the `operator!` to create
a negation expression. This situation could lead to unpleasant surprises where
the code does not behave like expected. Make all Expr and Stmt construction
operators free functions and move them to the `edsc::op` namespace. Callers
willing to use these operators must explicitly include them with the `using`
declaration. This can be done in some local scope.
Additionally, we currently emit signed comparisons for order-comparison
operators. With namespaces, we can later introduce two sets of operators in
different namespace, e.g. `edsc::op::sign` and `edsc::op::unsign` to clearly
state which kind of comparison is implied.
PiperOrigin-RevId: 233578674
The existing implementation in EDSC of 'for' loops in MLIREmitter is
unnecessarily restricted to constant bounds. The underlying AffineForOp can be
constructed from (a list of) Values and AffineMaps instead of constants. Its
verifier will check that the "affine provenance" conditions, i.e. that the
values used in the loop conditions are defined in such a way that they can be
analyzed by affine passes, are respected. One can use non-constant values in
affine loop bounds in conjunction with a single-dimensional identity affine
map. Implement this in MLIREmitter while maintaining the special case for
constant bounds that leads to significantly simpler generated IR when
applicable.
Test this change using the EDSC lowering test pass to inject code emitted from
EDSC into functions with predefined names.
PiperOrigin-RevId: 233578220
Associates opaque constants with a particular dialect. Adds general mechanism to register dialect-specific hooks defined in external components. Adds hooks to decode opaque tensor constant and extract an element of an opaque tensor constant.
This CL does not change the existing mechanism for registering constant folding hook yet. One thing at a time.
PiperOrigin-RevId: 233544757
- for the DMA buffers being allocated (and their tags), generate corresponding deallocs
- minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass
Code generation for DMA transfers was being done with the initial simplifying
assumption that the alloc's would map to scoped allocations, and so no
deallocations would be necessary. Drop this assumption to generalize. Note that
even with scoped allocations, unrolling loops that have scoped allocations
could create a series of allocations and exhaustion of fast memory. Having a
end of lifetime marker like a dealloc in fact allows creating new scopes if
necessary when lowering to a backend and still utilize scoped allocation.
DMA buffers created by -dma-generate are guaranteed to have either
non-overlapping lifetimes or nested lifetimes.
PiperOrigin-RevId: 233502632
- determine symbols for the memref region correctly
- this wasn't exposed earlier since we didn't have any test cases where the
portion of the nest being DMAed for was non-hyperrectangular (i.e., bounds of
one IV depending on other IVs within that part)
PiperOrigin-RevId: 233493872
* Add common broadcastable binary adder in TF ops and use for a few ops;
- Adding Sub, Mul here
* Change the prepare lowering to use TF variants;
* Add some more legalization patterns;
PiperOrigin-RevId: 233310952
* Fixed tfl.conv_2d and tfl.depthwise_conv_2d to have fused activation
function attribute
* Fixed RewriterGen crash: trying to get attribute match template when
the matcher is unspecified (UnsetInit)
PiperOrigin-RevId: 233241755
This CL allowed developers to write result ops having nested DAG nodes as their
arguments. Now we can write
```
def : Pat<(...), (AOp (BOp, ...), AOperand)>
```
PiperOrigin-RevId: 233207225
Previously we were using PatternRewrite::replaceOpWithNewOp() to both create the new op
inline and rewrite the matched op. That does not work well if we want to generate multiple
ops in a sequence. To support that, this CL changed to assign each newly created op to a
separate variable.
This CL also refactors how PatternEmitter performs the directive dispatch logic.
PiperOrigin-RevId: 233206819
Add support for converting `memref_cast` operations into the LLVM IR dialect.
This goes beyond want is currently implemented in the MLIR standard ops to LLVM
IR translation, but follows the general principles of the memref descriptors.
A memref cast creates a new descriptor containing the same buffer pointer but a
potentially different number of dynamic sizes (as many as dynamic dimensions in
the target memref type). The lowering copies the buffer pointer to the new
descriptor and inserts dynamic sizes to it. If the size is static in the
source type, a constant value is inserted as the dynamic size, otherwise a
dynamic value is copied from the source descriptor, taking into account the
difference in dynamic size positions in the descriptor.
PiperOrigin-RevId: 233082035
Make sure the module is always passed to the optimization layer.
Drop unused default argument for the IR transformation and remove the function
that was only used in this default argument. The transformation wrapper
constructor already checks for the null function, so the caller can just pass
`{}` if they don't want any transformation (no callers currently need this).
PiperOrigin-RevId: 233068817
Implement the lowering of memref load and store standard operations into the
LLVM IR dialect. This largely follows the existing mechanism in
MLIR-to-LLVM-IR translation for the sake of compatibility. A memref value is
transformed into a memref descriptor value which holds the pointer to the
underlying data buffer and the dynamic memref sizes. The data buffer is
contiguous. Accesses to multidimensional memrefs are linearized in row-major
form. In linear address computation, statically known sizes are used as
constants while dynamic sizes are extracted from the memref descriptor.
PiperOrigin-RevId: 233043846
That allows TensorFlow Add and Div ops to use Broadcastable op trait instead of
more restrictive SameValueType op trait.
That in turn allows TensorFlow ops to be registered by defining GET_OP_LIST and
including the generated ops file. Currently, tf-raise-control-flow pass tests
are using dynamic shapes in tf.Add op and AddOp can't be registered without
supporting the dynamic shapes.
TESTED with unit tests
PiperOrigin-RevId: 232927998
* Add tf.LeakyRelu op definition + folders (well one is really canonicalizer)
* Change generated error message to use attribute description instead;
* Change the return type of F32Attr to be APFloat - internally it is already
stored as APFloat so let the caller decides if they want to convert it or
not. I could see varying opinions here though :) (did not change i32attr
similarly)
PiperOrigin-RevId: 232923358
Aggregate types where at least one dimension is zero do not fully make sense as
they cannot contain any values (their total size is zero). However, TensorFlow
and XLA support tensors with zero sizes, so we must support those too. This is
relatively safe since, unlike vectors and memrefs, we don't have first-class
element accessors for MLIR tensors.
To support sparse element attributes of vector types that have no non-zero
elements, make sure that index and value element attributes have tensor type so
that we never need to create a zero vector type internally. Note that this is
already consistent with the inline documentation of the sparse elements
attribute. Users of the sparse elements attribute should not rely on the
storage schema anyway.
PiperOrigin-RevId: 232896707
Existing IR syntax is ambiguous in type declarations in presence of zero sizes.
In particular, `0x1` in the type size can be interpreted as either a
hexadecimal literal corresponding to 1, or as two distinct decimal literals
separated by an `x` for sizes. Furthermore, the shape `<0xi32>` fails lexing
because it is expected to be an integer literal.
Fix the lexer to treat `0xi32` as an integer literal `0` followed by a bare
identifier `xi32` (look one character ahead and early return instead of
erroring out).
Disallow hexadecimal literals in type declarations and forcibly split the token
into multiple parts while parsing the type. Note that the splitting trick has
been already present to separate the element type from the preceding `x`
character.
PiperOrigin-RevId: 232880373
The current ExecutionEngine flow generates the LLVM IR from MLIR and
JIT-compiles it as is without any transformation. It thus misses the
opportunity to perform optimizations supported by LLVM or collect statistics
about the module. Modify the Orc JITter to perform transformations on the LLVM
IR. Accept an optional LLVM module transformation function when constructing
the ExecutionEngine and use it while JIT-compiling. This prevents MLIR
ExecutionEngine from depending on LLVM passes; its clients should depend on the
passes they require.
PiperOrigin-RevId: 232877060
Instead, we deduce the result type from the given attribute.
This is in preparation for generating constant ops with TableGen.
PiperOrigin-RevId: 232723467
Implement the lowering of memref allocation and deallocation standard
operations into the LLVM IR dialect. This largely follows the existing
mechanism in MLIR-to-LLVM-IR translation for the sake of compatibility.
A memref value is transformed into a memref descriptor value which holds the
pointer to the underlying data buffer and the dynamic memref sizes. The buffer
is allocated using `malloc` and freed using `free`. The lowering inserts
declarations of these functions if necessary. Memref descriptors are values of
the LLVM IR structure type wrapped into an MLIR LLVM dialect type. The pointer
to the buffer and the individual sizes are accessed using `extractvalue` and
`insertvalue` LLVM IR instructions.
PiperOrigin-RevId: 232719419
*) Adds parameter to public API of MemRefRegion::compute for passing in the slice loop bounds to compute the memref region of the loop nest slice.
*) Exposes public method MemRefRegion::getRegionSize for computing the size of the memref region in bytes.
PiperOrigin-RevId: 232706165
Previously, we were using the trait mechanism to specify that an op has variadic operands.
That led a discrepancy between how we handle ops with deterministic number of operands.
Besides, we have no way to specify the constraints and match against the variadic operands.
This CL introduced Variadic<Type> as a way to solve the above issues.
PiperOrigin-RevId: 232656104
* AffineStructures has moved to IR.
* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.
* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.
* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.
PiperOrigin-RevId: 232586468
Motivation for this change is to remove redundant TF type attributes for
TensorFlow ops. For example, tf$T: "tfdtype$DT_FLOAT". Type attributes can be derived using the MLIR operand or result MLIR types, attribute names and their mapping. This will also allow constant folding of instructions generated within MLIR (and not imported from TensorFlow) without adding type attributes for the instruction.
Derived attributes are populated while exporting MLIR to TF GraphDef using
auto-generated populators. Populators are only available for the ops that are generated by the TableGen.
Also, fixed Operator::getNumArgs method to exclude derived attributes as they are not
part of the arguments.
TESTED with unit test
PiperOrigin-RevId: 232531561
Existing type syntax contains the following productions:
function-type ::= type-list-parens `->` type-list
type-list ::= type | type-list-parens
type ::= <..> | function-type
Due to these rules, when the parser sees `->` followed by `(`, it cannot
disambiguate if `(` starts a parenthesized list of function result types, or a
parenthesized list of operands of another function type, returned from the
current function. We would need an unknown amount of lookahead to try to find
the `->` at the right level of function nesting to differentiate between type
lists and singular function types.
Instead, require the result type of the function that is a function type itself
to be always parenthesized, at the syntax level. Update the spec and the
parser to correspond to the production rule names used in the spec (although it
would have worked without modifications). Fix the function type parsing bug in
the process, as it used to accept the non-parenthesized list of types for
arguments, disallowed by the spec.
PiperOrigin-RevId: 232528361
In optional attribute dictionary used, among others, in the generic form of the
ops, attribute types for integers and floats are omitted. This could lead to
inconsistencies when round-tripping the IR, in particular the attributes are
created with incorrect types after parsing (integers default to i64, floats
default to f64). Provide API to emit a trailing type after the attribute for
integers and floats. Use it while printing the optional attribute dictionary.
Omitting types for i64 and f64 is a pragmatic decision that minimizes changes
in tests. We may want to reconsider in the future and always print types of
attributes in the generic form.
PiperOrigin-RevId: 232480116
*) After a private memref buffer is created for a fused loop nest, dependences on the old memref are reduced, which can open up fusion opportunities. In these cases, users of the old memref are added back to the worklist to be reconsidered for fusion.
*) Fixed a bug in fusion insertion point dependence check where the memref being privatized was being skipped from the check.
PiperOrigin-RevId: 232477853
- use getAccessMap() instead of repeating it
- fold getMemRefRegion into MemRefRegion ctor (more natural, avoid heap
allocation and unique_ptr where possible)
- change extractForInductionVars - MutableArrayRef -> ArrayRef for the
arguments. Since the method is just returning copies of 'Value *', the client
can't mutate the pointers themselves; it's fine to mutate the 'Value''s
themselves, but that doesn't mutate the pointers to those.
- change the way extractForInductionVars returns (see b/123437690)
PiperOrigin-RevId: 232359277
- with this we won't see duplicate / unused operands when getting access maps,
or when constructing FlatAffineConstraints based on such maps
- we can probably change fullyComposeAffineMapAndOperands to ensure this
TODO(b/123879896).
PiperOrigin-RevId: 232356600
The generic form may be more desirable even when there is a custom form
specified so add option to enable emitting it. This also exposes a current bug
when round tripping constant with function attribute.
PiperOrigin-RevId: 232350712
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors
- change dma-generate pass to work on blocks of instructions (start/end
iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
option (besides command line flag)
- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods
Eg. output
$ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
^
$ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
PiperOrigin-RevId: 232297044
They are essentially both modelling MLIR OpTrait; the former achieves the
purpose via introducing corresponding symbols in TableGen, while the latter
just uses plain strings.
Unify them to provide a single mechanism to avoid confusion and to better
reflect the definitions on MLIR C++ side.
Ideally we should be able to deduce lots of these traits automatically via
other bits of op definitions instead of manually specifying them; but not
for now though.
PiperOrigin-RevId: 232191401
These attribute kinds are different from the rest in the sense that their types are defined
in MLIR's type hierarchy and we can build constant op out of them.
By defining this middle-level base class, we have a unified way to test and query the type
of these attributes, which will be useful when constructing constant ops of various dialects.
This CL also added asserts to reject non-NumericAttr in constant op's build() method.
PiperOrigin-RevId: 232188178
- fusion already includes the necessary analysis to create small/local buffers
post fusion; allocate these buffers in a higher memory space if the necessary
pass parameters are provided (threshold size, memory space id)
- although there will be a separate utility at some point to directly detect
and promote small local buffers to higher memory spaces, doing it while fusion
when possible is much less expensive, comes free with fusion analysis, and covers
a key common case.
PiperOrigin-RevId: 232063894
Nothing in the loop can (legally) cause curPtr -> nullptr. And if it did, we
would null dereference right below anyway.
This loop still reads funny to me but doesn't make me stare at it and wonder
what I am missing anymore.
--
PiperOrigin-RevId: 232062076
This CL added a tblgen::DagLeaf wrapper class with several helper methods for handling
DAG arguments. It helps to refactor the rewriter generation logic to be more higher
level.
This CL also added a tblgen::ConstantAttr wrapper class for constant attributes.
PiperOrigin-RevId: 232050683
- getTerminator() on a block can return nullptr; moreover, blocks that are improperly
constructed/transformed by utilities/passes may not have terminators even for the
top-level blocks
PiperOrigin-RevId: 232025963
This CL applies the following simplifications to EDSCs:
1. Rename Block to StmtList because an MLIR Block is a different, not yet
supported, notion;
2. Rework Bindable to drop specific storage and just use it as a simple wrapper
around Expr. The only value of Bindable is to force a static cast when used by
the user to bind into the emitter. For all intended purposes, Bindable is just
a lightweight check that an Expr is Unbound. This simplifies usage and reduces
the API footprint. After playing with it for some time, it wasn't worth the API
cognition overhead;
3. Replace makeExprs and makeBindables by makeNewExprs and copyExprs which is
more explicit and less easy to misuse;
4. Add generally useful functionality to MLIREmitter:
a. expose zero and one for the ubiquitous common lower bounds and step;
b. add support to create already bound Exprs for all function arguments as
well as shapes and views for Exprs bound to memrefs.
5. Delete Stmt::operator= and replace by a `Stmt::set` method which is more
explicit.
6. Make Stmt::operator Expr() explicit.
7. Indexed.indices assertions are removed to pave the way for expressing slices
and views as well as to work with 0-D memrefs.
The CL plugs those simplifications with TableGen and allows emitting a full MLIR function for
pointwise add.
This "x.add" op is both type and rank-agnostic (by allowing ArrayRef of Expr
passed to For loops) and opens the door to spinning up a composable library of
existing and custom ops that should automate a lot of the tedious work in
TF/XLA -> MLIR.
Testing needs to be significantly improved but can be done in a separate CL.
PiperOrigin-RevId: 231982325
This allow for arbitrarily complex builder patterns which is meant to cover initial cases while the modelling is improved and long tail cases/cases for which expanding the DSL would result in worst overall system.
NFC just sorting the emit replace methods alphabetical in the class and file body.
PiperOrigin-RevId: 231890352
This CL introduces a hotfix post refactoring of NestedMatchers:
- fix uninitialized read to skip
- avoid bumpptr allocating with 0 elements
Interestingly the latter issue only surfaced in fastbuild mode with no-san and
manifested itself by a SIGILL. All other combinations that were tried failed to
reproduce the issue (dbg, opt, fastbuild with asan)
PiperOrigin-RevId: 231787642
A performance issue was reported due to the usage of NestedMatcher in
ComposeAffineMaps. The main culprit was the ubiquitous copies that were
occuring when appending even a single element in `matchOne`.
This CL generally simplifies the implementation and removes one level of indirection by getting rid of
auxiliary storage as well as simplifying the API.
The users of the API are updated accordingly.
The implementation was tested on a heavily unrolled example with
ComposeAffineMaps and is now close in performance with an implementation based
on stateless InstWalker.
As a reminder, the whole ComposeAffineMaps pass is slated to disappear but the bug report was very useful as a stress test for NestedMatchers.
Lastly, the following cleanups reported by @aminim were addressed:
1. make NestedPatternContext scoped within runFunction rather than at the Pass level. This was caused by a previous misunderstanding of Pass lifetime;
2. use defensive assertions in the constructor of NestedPatternContext to make it clear a unique such locally scoped context is allowed to exist.
PiperOrigin-RevId: 231781279
This CL addresses some cleanups that were leftover after an incorrect rebase:
1. use StringSwitch
2. use // NOLINTNEXTLINE
3. remove a dead line of code
PiperOrigin-RevId: 231726640
a trivial inst walker :-) (reduces pass time from several minutes non-terminating to 120ms) - (fixes b/123541184)
- use a simple 7-line inst walker to collect affine_apply op's instead of the nested
matcher; -compose-affine-maps pass runs in 120ms now instead of 5 minutes + (non-
terminating / out of memory) - on a realistic test case that is 20,000 lines 12-d
loop nest
- this CL is also pushing for simple existing/standard patterns unless there
is a real efficiency issue (OTOH, fixing nested matcher to address this issue requires
cl/231400521)
- the improvement is from swapping out the nested walker as opposed to from a bug
or anything else that this CL changes
- update stale comment
PiperOrigin-RevId: 231623619
This CL mandated TypeConstraint and Type to provide descriptions and fixed
various subclasses and definitions to provide so. The purpose is to enforce
good documentation; using empty string as the default just invites oversight.
PiperOrigin-RevId: 231579629
* Emitted result lists for ops.
* Changed to allow empty summary and description for ops.
* Avoided indenting description to allow proper MarkDown rendering of
formatting markers inside description content.
* Used fixed width font for operand/attribute names.
* Massaged TensorFlow op docs and generated dialect op doc.
PiperOrigin-RevId: 231427574
Similar to op operands and attributes, use DAG to specify operation's results.
This will allow us to provide names and matchers for outputs.
Also Defined `outs` as a marker to indicate the start of op result list.
PiperOrigin-RevId: 231422455
This CL also introduces a set of python bindings using pybind11. The bindings
are exercised using a `test_py2andpy3.py` test suite that works for both
python 2 and 3.
`test_py3.py` on the other hand uses the more idiomatic,
python 3 only "PEP 3132 -- Extended Iterable Unpacking" to implement a rank
and type-agnostic copy with transposition.
Because python assignment is by reference, we cannot easily make the
assignment operator use the same type of sugaring as in C++; i.e. the
following:
```cpp
Stmt block = edsc::Block({
For(ivs, zeros, shapeA, ones, {
C[ivs] = IA[ivs] + IB[ivs]
})});
```
has no equivalent in the native Python EDSCs at this time.
However, the sugaring can be built as a simple DSL in python and is left as
future work.
PiperOrigin-RevId: 231337667
Python modules cannot be defined under a directory that has a `-` character in its name inside of Google code.
Rename to `google_mlir` which circumvents this limitation.
PiperOrigin-RevId: 231329321
Update to allow constant attribute values to be used to match or as result in rewrite rule. Define variable ctx in the matcher to allow matchers to refer to the context of the operation being matched.
PiperOrigin-RevId: 231322019
This CL adds support for calling EDSCs from other languages than C++.
Following the LLVM convention this CL:
1. declares simple opaque types and a C API in mlir-c/Core.h;
2. defines the implementation directly in lib/EDSC/Types.cpp and
lib/EDSC/MLIREmitter.cpp.
Unlike LLVM however the nomenclature for these types and API functions is not
well-defined, naming suggestions are most welcome.
To avoid the need for conversion functions, Types.h and MLIREmitter.h include
mlir-c/Core.h and provide constructors and conversion operators between the
mlir::edsc type and the corresponding C type.
In this first commit, mlir-c/Core.h only contains the types for the C API
to allow EDSCs to work from Python. This includes both a minimal set of core
MLIR
types (mlir_context_t, mlir_type_t, mlir_func_t) as well as the EDSC types
(edsc_mlir_emitter_t, edsc_expr_t, edsc_stmt_t, edsc_indexed_t). This can be
restructured in the future as concrete needs arise.
For now, the API only supports:
1. scalar types;
2. memrefs of scalar types with static or symbolic shapes;
3. functions with input and output of these types.
The C API is not complete wrt ownership semantics. This is in large part due
to the fact that python bindings are written with Pybind11 which allows very
idiomatic C++ bindings. An effort is made to write a large chunk of these
bindings using the C API but some C++isms are used where the design benefits
from this simplication. A fully isolated C API will make more sense once we
also integrate with another language like Swift and have enough use cases to
drive the design.
Lastly, this CL also fixes a bug in mlir::ExecutionEngine were the order of
declaration of llvmContext and the JIT result in an improper order of
destructors (which used to crash before the fix).
PiperOrigin-RevId: 231290250
Similar to other tblgen:: abstractions, tblgen::Pattern hides the native TableGen
API and provides a nicer API that is more coherent with the TableGen definitions.
PiperOrigin-RevId: 231285143
* Matching an attribute and specifying a attribute constraint is the same thing executionally, so represent it such.
* Extract AttrConstraint helper to match TypeConstraint and use that where mAttr was previously used in RewriterGen.
PiperOrigin-RevId: 231213580
Cleanup a usage of functional::map that is deemed too obscure in
`reindexAffineIndices`. Also fix a stale comment in `reindexAffineIndices`.
PiperOrigin-RevId: 231211184
Addresses b/122486036
This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing
the Null method and cleaning up the constructors.
As the ::Null uses were tracked down, opportunities appeared to untangle some
of the Parsing logic and make it explicit where AffineMap/IntegerSet have
ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit
pointer values of AffineMap* and IntegerSet* that were passed as function
parameters. Depending the values of those pointers one of 3 behaviors could
occur.
This parsing logic convolution is one of the rare cases where I would advocate
for code duplication. The more proper fix would be to make the syntax
unambiguous or to allow some lookahead.
PiperOrigin-RevId: 231058512
This CL follows up on a memory leak issue related to SmallVector growth that
escapes the BumpPtrAllocator.
The fix is to properly use ArrayRef and placement new to define away the
issue.
The following renaming is also applied:
1. MLFunctionMatcher -> NestedPattern
2. MLFunctionMatches -> NestedMatch
As a consequence all allocations are now guaranteed to live on the BumpPtrAllocator.
PiperOrigin-RevId: 231047766
Example:
dma-generate options:
-dma-fast-mem-capacity - Set fast memory space ...
-dma-fast-mem-space=<uint> - Set fast memory space ...
loop-fusion options:
-fusion-compute-tolerance=<number> - Fractional increase in ...
-fusion-maximal - Enables maximal loop fusion
loop-tile options:
-tile-size=<uint> - Use this tile size for ...
loop-unroll options:
-unroll-factor=<uint> - Use this unroll factor ...
-unroll-full - Fully unroll loops
-unroll-full-threshold=<uint> - Unroll all loops with ...
-unroll-num-reps=<uint> - Unroll innermost loops ...
loop-unroll-jam options:
-unroll-jam-factor=<uint> - Use this unroll jam factor ...
PiperOrigin-RevId: 231019363
Use `-mlir-pretty-debuginfo` if the user wants line breaks between different callsite lines.
The print results before and after this CL are shown in the tests.
PiperOrigin-RevId: 231013812
index remapping
- generate a sequence of single result affine_apply's for the index remapping
(instead of one multi result affine_apply)
- update dma-generate and loop-fusion test cases; while on this, change test cases
to use single result affine apply ops
- some fusion comment fix/cleanup
PiperOrigin-RevId: 230985830
- Update createAffineComputationSlice to generate a sequence of single result
affine apply ops instead of one multi-result affine apply
- update pipeline-data-transfer test case; while on this, also update the test
case to use only single result affine maps, and make it more robust to
change.
PiperOrigin-RevId: 230965478
This commit introduces a generic dialect conversion/lowering/legalization pass
and illustrates it on StandardOps->LLVMIR conversion.
It partially reuses the PatternRewriter infrastructure and adds the following
functionality:
- an actual pass;
- non-default pattern constructors;
- one-to-many rewrites;
- rewriting terminators with successors;
- not applying patterns iteratively (unlike the existing greedy rewrite driver);
- ability to change function signature;
- ability to change basic block argument types.
The latter two things required, given the existing API, to create new functions
in the same module. Eventually, this should converge with the rest of
PatternRewriter. However, we may want to keep two pass versions: "heavy" with
function/block argument conversion and "light" that only touches operations.
This pass creates new functions within a module as a means to change function
signature, then creates new blocks with converted argument types in the new
function. Then, it traverses the CFG in DFS-preorder to make sure defs are
converted before uses in the dominated blocks. The generic pass has a minimal
interface with two hooks: one to fill in the set of patterns, and another one
to convert types for functions and blocks. The patterns are defined as
separate classes that can be table-generated in the future.
The LLVM IR lowering pass partially inherits from the existing LLVM IR
translator, in particular for type conversion. It defines a conversion pattern
template, instantiated for different operations, and is a good candidate for
tablegen. The lowering does not yet support loads and stores and is not
connected to the translator as it would have broken the existing flows. Future
patches will add missing support before switching the translator in a single
patch.
PiperOrigin-RevId: 230951202
This CL adds a new marker, replaceWithValue, to indicate that no new result
op is generated by applying a pattern. Instead, the matched DAG is replaced
by an existing SSA value.
Converted the tf.Identity converter to use the pattern.
PiperOrigin-RevId: 230922323
This implements a simple CPU runner based on LLVM Orc JIT. The base
functionality is provided by the ExecutionEngine class that compiles and links
the module, and provides an interface for obtaining function pointers to the
JIT-compiled MLIR functions and for invoking those functions directly. Since
function pointers need to be casted to the correct pointer type, the
ExecutionEngine wraps LLVM IR functions obtained from MLIR into a helper
function with the common signature `void (void **)` where the single argument
is interpreted as a list of pointers to the actual arguments passed to the
function, eventually followed by a pointer to the result of the function.
Additionally, the ExecutionEngine is set up to resolve library functions to
those available in the current process, enabling support for, e.g., simple C
library calls.
For integration purposes, this also provides a simplistic runtime for memref
descriptors as expected by the LLVM IR code produced by MLIR translation. In
particular, memrefs are transformed into LLVM structs (can be mapped to C
structs) with a pointer to the data, followed by dynamic sizes. This
implementation only supports statically-shaped memrefs of type float, but can
be extened if necessary.
Provide a binary for the runner and a test that exercises it.
PiperOrigin-RevId: 230876363
- introduce a way to compute union using symbolic rectangular bounding boxes
- handle multiple load/store op's to the same memref by taking a union of the regions
- command-line argument to provide capacity of the fast memory space
- minor change to replaceAllMemRefUsesWith to not generate affine_apply if the
supplied index remap was identity
PiperOrigin-RevId: 230848185
canonicalizations of operations. The ultimate important user of this is
going to be a funcBuilder->foldOrCreate<YourOp>(...) API, but for now it
is just a more convenient way to write certain classes of canonicalizations
(see the change in StandardOps.cpp).
NFC.
PiperOrigin-RevId: 230770021
- switch some debug info to emitError
- use a single constant op for zero index to make it easier to write/update
test cases; avoid creating new constant op's for common zero index cases
- test case cleanup
This is in preparation for an upcoming major update to this pass.
PiperOrigin-RevId: 230728379
Example inline notation:
trailing-location ::= 'loc' '(' location ')'
// FileLineCol Location.
%1 = "foo"() : () -> i1 loc("mysource.cc":10:8)
// Name Location
return loc("foo")
// CallSite Location
return loc(callsite("foo" at "mysource.cc":19:9))
// Fused Location
/// Without metadata
func @inline_notation() loc(fused["foo", "mysource.cc":10:8])
/// With metadata
return loc(fused<"myPass">["foo", "foo2"])
// Unknown location.
return loc(unknown)
Locations are currently only printed with inline notation at the line of each instruction. Further work is needed to allow for reference notation, e.g:
...
return loc 1
}
...
loc 1 = "source.cc":10:1
PiperOrigin-RevId: 230587621
This CL just changes various docs and comments to use the term "generic" and
"custom" when mentioning assembly forms. To be consist, several methods are
also renamed:
* FunctionParser::parseVerboseOperation() -> parseGenericOperation()
* ModuleState::hasShorthandForm() -> hasCustomForm()
* OpAsmPrinter::printDefaultOp() -> printGenericOp()
PiperOrigin-RevId: 230568819
- update fusion cost model to fuse while tolerating a certain amount of redundant
computation; add cl option -fusion-compute-tolerance
evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC
PiperOrigin-RevId: 230541857
This CL adds the Return op to EDSCs types and emitter.
This allows generating full function bodies that can be compiled all the way
down to LLVMIR and executed on CPU.
At this point, the MLIR lacks the testing infrastructure to exercise this.
End-to-end testing of full functions written in EDSCs is left for a future CL.
PiperOrigin-RevId: 230527530
- unrolling a single iteration loop by a factor of one should promote its body
into its parent; this makes it consistent with the behavior/expectation that
unrolling a loop by a factor equal to its trip count makes the loop go away.
PiperOrigin-RevId: 230426499
- ForInst::walkOps will also be used in an upcoming CL (cl/229438679); better to have
this instead of deriving from the InstWalker
PiperOrigin-RevId: 230413820
- improve/fix doc comments for affine apply composition related methods.
- drop makeSingleValueComposedAffineApply - really redundant and out of line in
a public API; it's just returning the first result of the composed affine
apply op, and not making a single result affine map or an affine_apply op.
PiperOrigin-RevId: 230406169
- the size of the private memref created for the slice should be based on
the memref region accessed at the depth at which the slice is being
materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
to the region accessed based on the entire domain.
- leads to a significant contraction of the temporary / intermediate memref
whenever the memref isn't reduced to a single scalar (through store fwd'ing).
Other changes
- update to promoteIfSingleIteration - avoid introducing unnecessary identity
map affine_apply from IV; makes it much easier to write and read test cases
and pass output for all passes that use promoteIfSingleIteration; loop-fusion
test cases become much simpler
- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
'domInstFilter' could be one of the ops erased due to a memref replacement in
it.
- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
missing (the latter need not always be 1); add lbFloorDivisors output argument
- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape
PiperOrigin-RevId: 230405218
*) Do not remove loop nests which write to memrefs which escape the function.
*) Do not remove memrefs which escape the function (e.g. are used in the return instruction).
PiperOrigin-RevId: 230398630
Add default values to attributes, to allow attribute being left unspecified. The attr getter will always return an attribute so callers need not check for it, if the attribute is not set then the default will be returned (at present the default will be constructed upon query but this will be changed).
Add op definition for tf.AvgPool in ops.td, rewrite matcher using pattern using attribute matching & transforms. Adding some helper functions to make it simpler.
Handle attributes with dialect prefix and map them to getter without dialect prefix.
Note: VerifyAvgPoolOp could probably be autogenerated by know given the predicate specification on attributes, but deferring that to a follow up.
PiperOrigin-RevId: 230364857
Start doc generation pass that generates simple markdown output. The output is formatted simply[1] in markdown, but this allows seeing what info we have, where we can refine the op description (e.g., the inputs is probably redundant), what info is missing (e.g., the attributes could probably have a description).
The formatting of the description is still left up to whatever was in the op definition (which luckily, due to the uniformity in the .td file, turned out well but relying on the indentation there is fragile). The mechanism to autogenerate these post changes has not been added yet either. The output file could be run through a markdown formatter too to remove extra spaces.
[1]. This is not proposal for final style :) There could also be a discussion around single doc vs multiple (per dialect, per op), whether we want a TOC, whether operands/attributes should be headings or just formatted differently ...
PiperOrigin-RevId: 230354538
This is needed to allow binding to more constant types.
Tests that exercise this behavior will come in a followup CL.
In the meantime this does not breaks things.
PiperOrigin-RevId: 230320621
1) Fix FloatAttr type inconsistency in conversion from tf.FusedBatchNorm to TFLite ops
We used to compose the splat tensor out of the scalar epsilon attribute by using the
type of the variance operand. However, the epsilon attribute may have a different
bitwidth than the one in the variance operand. So it ends up we were creating
inconsistent types within the FloatAttr itself.
2) Fix SplatElementsAttr type inconsistency in AnnotateInputArrays
We need to create the zero-valued attribute according to the type provided as the
command-line arguments.
3) Concretize the result type of tf.Shape constant folding test case
Currently the resultant constant is created by the constant folding harness, using
the result type of the original op as the constant's result type. That can be
a different type than the constant's internal DenseElementsAttr.
PiperOrigin-RevId: 230244665
- print multiplication by -1 as unary negate; expressions like s0 * -1, d0 * -1
+ d1 will now appear as -s0, -d0 + d1 resp.
- a minor cleanup while on printAffineExprInternal
PiperOrigin-RevId: 230222151
This CL also makes ScopedEDSCContexts to reset the Bindable numbering when
creating a new context.
This is useful to write minimal tests that don't use FileCheck pattern
captures for now.
PiperOrigin-RevId: 230079997
This CL performs a bunch of cleanups related to EDSCs that are generally
useful in the context of using them with a simple wrapping C API (not in this
CL) and with simple language bindings to Python and Swift.
PiperOrigin-RevId: 230066505
- detected with memref-bound-check
- fixes b/123072438; while on this, fix another test case which was reported
out of bounds
PiperOrigin-RevId: 229978187
*) Enables reduction of private memref size based on MemRef region accessed by fused slice.
*) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence.
*) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice).
PiperOrigin-RevId: 229936698
This CL adds a test reported by andydavis@ and fixes the corner case that
appears when operands do not come from an AffineApply and no Dim composition
is needed.
In such cases, we would need to create an empty map which is disallowed.
The composition in such cases becomes trivial: there is no composition.
This CL also updates the name AffineNormalizer to AffineApplyNormalizer.
PiperOrigin-RevId: 229819234
Change MinMaxAttr to match hasValidMinMaxAttribute behavior. Post rewriting the other users of that function it could be removed too. The currently generated error message is:
error: 'tfl.fake_quant' op attribute 'minmax' failed to satisfy constraint of MinMaxAttr
PiperOrigin-RevId: 229775631
This CL fixes a misunderstanding in how to build DimOp which triggered
execution issues in the CPU path.
The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to
construct the dynamic dimensions should be:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and
`dim %arg, 4 : memref<?x4x?x8x?xf32>`
Before this CL, we wold construct:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 1 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and expect the other dimensions to be constants.
This assumption seems consistent at first glance with the syntax of alloc:
```
%tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32>
```
But this was actuallyincorrect.
This CL also makes the relevant functions available to EDSCs and removes
duplication of the incorrect function.
PiperOrigin-RevId: 229622766
The operand and result types of binary ops are not necessarily the
same. For those binary ops, we cannot print in the short-form assembly.
Enhance impl:::printBinaryOp to consider operand and result types
to select which assembly form to use.
PiperOrigin-RevId: 229608142
A recent change in TableGen definitions allowed arbitrary AND/OR predicate
compositions at the cost of removing known-true predicate simplification.
Introduce a more advanced simplification mechanism instead.
In particular, instead of folding predicate C++ expressions directly in
TableGen, keep them as is and build a predicate tree in TableGen C++ library.
The predicate expression-substitution mechanism, necessary to implement complex
predicates for nested classes such as `ContainerType`, is replaced by a
dedicated predicate. This predicate appears in the predicate tree and can be
used for tree matching and separation. More specifically, subtrees defined
below such predicate may be subject to different transformations than those
that appear above. For example, a subtree known to be true above the
substitution predicate is not necessarily true below it.
Use the predicate tree structure to eliminate known-true and known-false
predicates before code emission, as well as to collapse AND and OR predicates
if their value can be deduced based on the value of one child.
PiperOrigin-RevId: 229605997
Start simple with single predicate match & transform rules for attributes.
* Its unclear whether modelling Attr predicates will be needed so start with allowing matching attributes with a single predicate.
* The input and output attr type often differs and so add ability to specify a transform between the input and output format.
PiperOrigin-RevId: 229580879
*) Adds support for fusing into consumer loop nests with multiple loads from the same memref.
*) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth.
*) Removes dependence on src loop depth and simplifies cost model computation.
PiperOrigin-RevId: 229575126
This is mostly plumbing to start allowing testing EDSC lowering. Prototype specifying reference implementation using verbose format without any generation/binding support. Add test pass that dumps the constructed EDSC (of which there can only be one). The idea is to enable iterating from multiple sides, this is wrong on many dimensions at the moment.
PiperOrigin-RevId: 229570535
In TableGen definitions, the "Type" class has been used for types of things
that can be stored in Attributes, but not necessarily present in the MLIR type
system. As a consequence, records like "String" or "DerviedAttrBody" were of
class "Type", which can be confusing. Furthermore, the "builderCall" field of
the "Type" class serves only for attribute construction. Some TableGen "Type"
subclasses that correspond to MLIR kinds of types do not have a canonical way
of construction only from the data available in TableGen, e.g. MemRefType would
require the list of affine maps. This leads to a conclusion that the entities
that describe types of objects appearing in Attributes should be independent of
"Type": they have some properties "Type"s don't and vice versa.
Do not parameterize Tablegen "Attr" class by an instance of "Type". Instead,
provide a "constBuilderCall" field that can be used to build an attribute from
a constant value stored in TableGen instead of indirectly going through
Attribute.Type.builderCall. Some attributes still don't have a
"constBuilderCall" because they used to depend on types without a
"builderCall".
Drop definitions of class "Type" that don't correspond to MLIR Types. Provide
infrastructure to define type-dependent attributes and string-backed attributes
for convenience.
PiperOrigin-RevId: 229570087
We also need the broadcast logic in the TensorFlow dialect. Move it to a
Dialect/ directory for a broader scope. This Dialect/ directory is intended
for code not in core IR, but can potentially be shared by multiple dialects.
Apart from fixing TensorFlow op TableGen to use this trait, this CL only
contains mechanical code shuffling.
PiperOrigin-RevId: 229563911
The constant folding rules assumes value attributes of operands are already
verified to be in good standing.
For each op in the above, the constant folding rules support both integer and
floating point cases. Broadcast behavior is also supported as per the semantics
of TFLite ops.
This CL does not handle overflow/underflow cases yet.
PiperOrigin-RevId: 229441221
LLVM IR types are defined using MLIR's extendable type system. The dialect
provides the only type kind, LLVMType, that wraps an llvm::Type*. Since LLVM
IR types are pointer-unique, MLIR type systems relies on those pointers to
perform its own type unique'ing. Type parsing and printing is delegated to
LLVM libraries.
Define MLIR operations for the LLVM IR instructions currently used by the
translation to the LLVM IR Target to simplify eventual transition. Operations
classes are defined using TableGen. LLVM IR instruction operands that are only
allowed to take constant values are accepted as attributes instead. All
operations are using verbose form for printing and parsing.
PiperOrigin-RevId: 229400375
MLIR has support for type-polymorphic instructions, i.e. instructions that may
take arguments of different types. For example, standard arithmetic operands
take scalars, vectors or tensors. In order to express such instructions in
TableGen, we need to be able to verify that a type object satisfies certain
constraints, but we don't need to construct an instance of this type. The
existing TableGen definition of Type requires both. Extract out a
TypeConstraint TableGen class to define restrictions on types. Define the Type
TableGen class as a subclass of TypeConstraint for consistency. Accept records
of the TypeConstraint class instead of the Type class as values in the
Arguments class when defining operators.
Replace the predicate logic TableGen class based on conjunctive normal form
with the predicate logic classes allowing for abitrary combinations of
predicates using Boolean operators (AND/OR/NOT). The combination is
implemented using simple string rewriting of C++ expressions and, therefore,
respects the short-circuit evaluation order. No logic simplification is
performed at the TableGen level so all expressions must be valid C++.
Maintaining CNF using TableGen only would have been complicated when one needed
to introduce top-level disjunction. It is also unclear if it could lead to a
significantly simpler emitted C++ code. In the future, we may replace inplace
predicate string combination with a tree structure that can be simplified in
TableGen's C++ driver.
Combined, these changes allow one to express traits like ArgumentsAreFloatLike
directly in TableGen instead of relying on C++ trait classes.
PiperOrigin-RevId: 229398247
This allows load, store and ForNest to be used with both Expr and Bindable.
This simplifies writing generic pieces of MLIR snippet.
For instance, a generic pointwise add can now be written:
```cpp
// Different Bindable ivs, one per loop in the loop nest.
auto ivs = makeBindables(shapeA.size());
Bindable zero, one;
// Same bindable, all equal to `zero`.
SmallVector<Bindable, 8> zeros(ivs.size(), zero);
// Same bindable, all equal to `one`.
SmallVector<Bindable, 8> ones(ivs.size(), one);
// clang-format off
Bindable A, B, C;
Stmt scalarA, scalarB, tmp;
Stmt block = edsc::Block({
ForNest(ivs, zeros, shapeA, ones, {
scalarA = load(A, ivs),
scalarB = load(B, ivs),
tmp = scalarA + scalarB,
store(tmp, C, ivs)
}),
});
// clang-format on
```
This CL also adds some extra support for pretty printing that will be used in
a future CL when we introduce standalone testing of EDSCs. At the momen twe
are lacking the basic infrastructure to write such tests.
PiperOrigin-RevId: 229375850
- this change is already consistent with the current code
- having no constraints made the integer set spec look odd - as nothing appears
between ':' and the closing parenthesis
- there is no loss in representational power - an unconstrained set can always
be represented by a trivially true constraint
PiperOrigin-RevId: 229307353
DenseElementAttr currently does not support value bitwidths of > 64. This can result in asan failures and crashes when trying to invoke DenseElementsAttr::writeBits/DenseElementsAttr::readBits.
PiperOrigin-RevId: 229241125
*) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality).
*) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest.
*) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests).
*) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths).
*) Test: Adds multiple unit tests to test the new functionality.
PiperOrigin-RevId: 229219757