This CL fixes 2 recent issues with EDSCs:
1. the type of the LHS in Stmt::operator=(Expr rhs) should be the same as the (asserted unique) return type;
2. symbols coming from DimOp should be admissible as lower / upper bounds in For
The relevant tests are added.
PiperOrigin-RevId: 234750249
- compute slices precisely where the destination iteration depends on multiple source
iterations (instead of over-approximating to the whole source loop extent)
- update unionBoundingBox to deal with input with non-matching symbols
- reenable disabled backend test case
PiperOrigin-RevId: 234714069
- hoist DMAs past all loops immediately surrounding the region that the latter
is invariant on - do this at DMA generation time itself
PiperOrigin-RevId: 234628447
Originally, edsc::Expr had a long enum edsc::ExprKind with all supported types
of operations. Recent Expr extensibility support removed the need to specify
supported types in advance. Replace the no-longer-used blocks of enum values
reserved for unary/binary/ternary/variadic expressions with simple values (it
is still useful to know if an expression is, e.g., binary to access it through
a simpler API).
Furthermore, wrap string-comparison now used to identify specific ops into an
`Expr::is_op<>` function template, that acts similarly to `Instruction::isa<>`.
Introduce `{Unary,Binary,Ternary,Variadic}Expr::make<> ` function template that
creates a Expression emitting the MLIR Op specified as template argument.
PiperOrigin-RevId: 234612916
Expose the result types of edsc::Expr, which are now stored for all types of
Exprs and not only for the variadic ones. Require return types when an Expr is
constructed, if it will ever have some. An empty return type list is
interpreted as an Expr that does not create a value (e.g. `return` or `store`).
Conceptually, all edss::Exprs are now typed, with the type being a (potentially
empty) tuple of return types. Unbound expressions and Bindables must now be
constructed with a specific type they will take. This makes EDSC less
evidently type-polymorphic, but we can still write generic code such as
Expr sumOfSquares(Expr lhs, Expr rhs) { return lhs * lhs + rhs * rhs; }
and use it to construct different typed expressions as
sumOfSquares(Bindable(IndexType::get(ctx)), Bindable(IndexType::get(ctx)));
sumOfSquares(Bindable(FloatType::getF32(ctx)),
Bindable(FloatType::getF32(ctx)));
On the positive side, we get the following.
1. We can now perform type checking when constructing Exprs rather than during
MLIR emission. Nevertheless, this is still duplicates the Op::verify()
until we can factor out type checking from that.
2. MLIREmitter is significantly simplified.
3. ExprKind enum is only used for actual kinds of expressions. Data structures
are converging with AbstractOperation, and the users can now create a
VariadicExpr("canonical_op_name", {types}, {exprs}) for any operation, even
an unregistered one without having to extend the enum and make pervasive
changes to EDSCs.
On the negative side, we get the following.
1. Typed bindables are more verbose, even in Python.
2. We lose the ability to do print debugging for higher-level EDSC abstractions
that are implemented as multiple MLIR Ops, for example logical disjunction.
This is the step 2/n towards making EDSC extensible.
***
Move MLIR Op construction from MLIREmitter::emitExpr to Expr::build since Expr
now has sufficient information to build itself.
This is the step 3/n towards making EDSC extensible.
Both of these strive to minimize the amount of irrelevant changes. In
particular, this introduces more complex pretty-printing for affine and binary
expression to make sure tests continue to pass. It also relies on string
comparison to identify specific operations that an Expr produces.
PiperOrigin-RevId: 234609882
This CL extended TableGen Operator class to provide accessors for information on op
results.
In OpDefinitionGen, added checks to make sure only the last result can be variadic,
and adjusted traits and builders generation to consider variadic results.
PiperOrigin-RevId: 234596124
EDSC currently implement a block as a statement that is itself a list of
statements. This suffers from two modeling problems: (1) these blocks are not
addressable, i.e. one cannot create an instruction where thus constructed block
is a successor; (2) they support block nesting, which is not supported by MLIR
blocks. Furthermore, emitting such "compound statement" (misleadingly named
`Block` in Python bindings) does not actually produce a new Block in the IR.
Implement support for creating actual IR Blocks in EDSC. In particular, define
a new StmtBlock EDSC class that is neither an Expr nor a Stmt but contains a
list of Stmts. Additionally, StmtBlock may have (early-) typed arguments.
These arguments are Bindable expressions that can be used inside the block.
Provide two calls in the MLIREmitter, `emitBlock` that actually emits a new
block and `emitBlockBody` that only emits the instructions contained in the
block without creating a new block. In the latter case, the instructions must
not use block arguments.
Update Python bindings to make it clear when instruction emission happens
without creating a new block.
PiperOrigin-RevId: 234556474
The parameter to emitStandaloneParamBuilder() was renamed from hasResultType to
isAllSameType, which is the opposite boolean value. The logic should be changed
to make them consistent.
Also re-ordered some methods in Operator. And few other tiny improvements.
PiperOrigin-RevId: 234478316
generation pass to make it drop certain assumptions, complete TODOs.
- multiple fixes for getMemoryFootprintBytes
- pass loopDepth correctly from getMemoryFootprintBytes()
- use union while computing memory footprints
- bug fixes for addAffineForOpDomain
- take into account loop step
- add domains of other loop IVs in turn that might have been used in the bounds
- dma-generate: drop assumption of "non-unit stride loops being tile space loops
and skipping those and recursing to inner depths"; DMA generation is now purely
based on available fast mem capacity and memory footprint's calculated
- handle memory region compute failures/bailouts correctly from dma-generate
- loop tiling cleanup/NFC
- update some debug and error messages to use emitNote/emitError in
pipeline-data-transfer pass - NFC
PiperOrigin-RevId: 234245969
The existing implementation of makeFunctionType in EDSC contains a bug: the
array of input types is overwritten using output types passed as arguments and
the array of output types is never filled in. This leads to all sorts of
incorrect memory behavior. Fill in the array of output types using the proper
argument.
PiperOrigin-RevId: 234177221
A recent change introduced a possibility to run LLVM IR transformation during
JIT-compilation in the ExecutionEngine. Provide helper functions that
construct IR transformers given either clang-style optimization levels or a
list passes to run. The latter wraps the LLVM command line option parser to
parse strings rather than actual command line arguments. As a result, we can
run either of
mlir-cpu-runner -O3 input.mlir
mlir-cpu-runner -some-mlir-pass -llvm-opts="-llvm-pass -other-llvm-pass"
to combine different transformations. The transformer builder functions are
provided as a separate library that depends on LLVM pass libraries unlike the
main execution engine library. The library can be used for integrating MLIR
execution engine into external frameworks.
PiperOrigin-RevId: 234173493
*) Adds utility to LoopUtils to perform loop interchange of two AffineForOps.
*) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges.
*) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential.
*) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them.
*) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation.
*) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest.
*) Adds and updates related unit tests.
PiperOrigin-RevId: 234158370
We specify op operands and results in TableGen op definition using the same syntax.
They should be modelled similarly in TableGen driver wrapper classes.
PiperOrigin-RevId: 234153332
Add support for converting MLIR `call_indirect` instructions to the LLVM IR
dialect. In LLVM IR, the same instruction is used for direct and indirect
calls. In the dialect, we have `llvm.call` and `llvm.call0` to work around the
absence of the void type in MLIR. For direct calls, the callee is stored as
instruction attribute. Use the same pair of instructions for indirect calls by
omitting the callee attribute. In the MLIR to LLVM IR translator, check the
presence of attribute to decide whether to construct a direct or an indirect
call using different LLVM IR Builder functions.
Add support for converting constants of function type to the LLVM IR dialect
and for translating them to the LLVM IR proper. The `llvm.constant` operation
works similarly to other types: its attribute has MLIR function type but the
value it produces has LLVM IR function type wrapped in the dialect type. While
lowering, look up the pointer to the converted function in the corresponding
mapping.
PiperOrigin-RevId: 234132351
Function types are built-in in MLIR and affect the validity of the IR itself.
However, advanced target dialects such as the LLVM IR dialect may include
custom function types. Until now, dialect conversion was expecting function
types not to be converted to the custom type: although the signatures was
allowed to change, the outer type must have been an mlir::FunctionType. This
effectively prevented dialect conversion from creating instructions that
operate on values of the custom function type.
Dissociate function signature conversion from general type conversion.
Function signature conversion must still produce an mlir::FunctionType and is
used in places where built-in types are required to make IR valid. General
type conversion is used for SSA values, including function and block arguments
and function results.
Exercise this behavior in the LLVM IR dialect conversion by converting function
types to LLVM IR function pointer types. The pointer to a function is chosen
to provide consistent lowering of higher-order functions: while it is possible
to have a value of function type, it is not possible to create a function type
accepting a returning another function type.
PiperOrigin-RevId: 234124494
Update FlatAffineConstraints::getLower/UpperBounds to project to the identifier for which bounds are being computed. This change enables computing bounds on an identifier which were previously dependent on the bounds of another identifier.
PiperOrigin-RevId: 234017514
for dma-generate, loop-unroll.
- add -tile-sizes command line option for loop tiling to specify different tile
sizes for loops in a band
- clean up command line options for loop-unroll, dma-generate (remove
cl::hidden)
PiperOrigin-RevId: 234006232
If we see an add op adding a constant value to a convolution op with constant
bias, we can fuse the add into the convolution op by constant folding the
bias and the add op's constant operand.
This CL also removes dangling RewriterGen check that prevents us from using
nested DAG nodes in result patterns, which is already supported.
PiperOrigin-RevId: 233989654
For ops with the SameOperandsAndResultType trait, we know that all result types
should be the same as the first operand's type. So we can generate a build()
method without requiring result types as parameters and also invoke this method
when constructing such ops during expanding rewrite patterns.
Similarly for ops have broadcast behavior, we can define build() method to use
the deduced type as the result type. So we can also calling into this build()
method when constructing ops in RewriterGen.
PiperOrigin-RevId: 233988307
In LowerEDSCTestPass, there are two range-for loops that only do assertions on
the loop variable. With assertions disabled, the variable becomes unused and
triggers a warning promoted to error. Cast it to void in the loop to supress
the warning.
PiperOrigin-RevId: 233936171
Original implementation of the translation from MLIR to LLVM IR operated on the
Standard+BuiltIn dialect, with a later addition of the SuperVector dialect.
This required the translation to be aware of a potetially large number of other
dialects as the infrastructure extended. With the recent introduction of the
LLVM IR dialect into MLIR, the translation can be switched to only translate
the LLVM IR dialect, and the translation of the operations becomes largely
mechanical.
The reimplementation of the translator follows the lines of the original
translator in function and basic block conversion. In particular, block
arguments are converted to LLVM IR PHI nodes, which are connected to their
sources after all blocks of a function had been converted. Thanks to LLVM IR
types being wrapped in the MLIR LLVM dialect type, type conversion is
simplified to only convert function types, all other types are simply
unwrapped. Individual instructions are constructed using the LLVM IRBuilder,
which has a great potential for being table-generated from the LLVM IR dialect
operation definitions.
The input of the test/Target/llvmir.mlir is updated to use the MLIR LLVM IR
dialect. While it is now redundant with the dialect conversion test, the point
of the exercise is to guarantee exactly the same LLVM IR is emitted. (Only the
name of the allocation function is changed from `__mlir_alloc` to `alloc` in
the CHECK lines.) It will be simplified in a follow-up commit.
PiperOrigin-RevId: 233842306
EDSC expressions evolved to have different types of underlying storage.
Separate classes are used for unary, binary, ternary and variadic expressions.
The latter covers all the needs of the three special cases. Remove these
special cases and use a single ExprStorage class everywhere while maintaining
the same APIs at the Expr level (ExprStorage is an internal implementation
class).
This is step 1/n to converging EDSC expressions and Ops and making EDSCs
support custom operations.
PiperOrigin-RevId: 233704912
DimOp is converted to a constant LLVM IR dialect operation for static
dimensions and to an access to the dynamic size info stored in the memref
descriptor for the dynamic dimensions. This behavior is consistent with the
existing mlir-translator.
This completes the porting of MLIR -> LLVM lowering to the dialect conversion
infrastructure.
PiperOrigin-RevId: 233665634
- for the DMA transfers being pipelined through double buffering, generate
deallocs for the double buffers being alloc'ed
This change is along the lines of cl/233502632. We initially wanted to experiment with
scoped allocation - so the deallocation's were usually not necessary; however, they are
needed even with scoped allocations in some situations - for eg. when the enclosing loop
gets unrolled. The dealloc serves as an end of lifetime marker.
PiperOrigin-RevId: 233653463
In the current state, edsc::Expr and edsc::Stmt overload operators to construct
other Exprs and Stmts. This includes some unconventional overloads of the
`operator==` to create a comparison expression and of the `operator!` to create
a negation expression. This situation could lead to unpleasant surprises where
the code does not behave like expected. Make all Expr and Stmt construction
operators free functions and move them to the `edsc::op` namespace. Callers
willing to use these operators must explicitly include them with the `using`
declaration. This can be done in some local scope.
Additionally, we currently emit signed comparisons for order-comparison
operators. With namespaces, we can later introduce two sets of operators in
different namespace, e.g. `edsc::op::sign` and `edsc::op::unsign` to clearly
state which kind of comparison is implied.
PiperOrigin-RevId: 233578674
The existing implementation in EDSC of 'for' loops in MLIREmitter is
unnecessarily restricted to constant bounds. The underlying AffineForOp can be
constructed from (a list of) Values and AffineMaps instead of constants. Its
verifier will check that the "affine provenance" conditions, i.e. that the
values used in the loop conditions are defined in such a way that they can be
analyzed by affine passes, are respected. One can use non-constant values in
affine loop bounds in conjunction with a single-dimensional identity affine
map. Implement this in MLIREmitter while maintaining the special case for
constant bounds that leads to significantly simpler generated IR when
applicable.
Test this change using the EDSC lowering test pass to inject code emitted from
EDSC into functions with predefined names.
PiperOrigin-RevId: 233578220
Associates opaque constants with a particular dialect. Adds general mechanism to register dialect-specific hooks defined in external components. Adds hooks to decode opaque tensor constant and extract an element of an opaque tensor constant.
This CL does not change the existing mechanism for registering constant folding hook yet. One thing at a time.
PiperOrigin-RevId: 233544757
- for the DMA buffers being allocated (and their tags), generate corresponding deallocs
- minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass
Code generation for DMA transfers was being done with the initial simplifying
assumption that the alloc's would map to scoped allocations, and so no
deallocations would be necessary. Drop this assumption to generalize. Note that
even with scoped allocations, unrolling loops that have scoped allocations
could create a series of allocations and exhaustion of fast memory. Having a
end of lifetime marker like a dealloc in fact allows creating new scopes if
necessary when lowering to a backend and still utilize scoped allocation.
DMA buffers created by -dma-generate are guaranteed to have either
non-overlapping lifetimes or nested lifetimes.
PiperOrigin-RevId: 233502632
- determine symbols for the memref region correctly
- this wasn't exposed earlier since we didn't have any test cases where the
portion of the nest being DMAed for was non-hyperrectangular (i.e., bounds of
one IV depending on other IVs within that part)
PiperOrigin-RevId: 233493872
* Add common broadcastable binary adder in TF ops and use for a few ops;
- Adding Sub, Mul here
* Change the prepare lowering to use TF variants;
* Add some more legalization patterns;
PiperOrigin-RevId: 233310952
* Fixed tfl.conv_2d and tfl.depthwise_conv_2d to have fused activation
function attribute
* Fixed RewriterGen crash: trying to get attribute match template when
the matcher is unspecified (UnsetInit)
PiperOrigin-RevId: 233241755
This CL allowed developers to write result ops having nested DAG nodes as their
arguments. Now we can write
```
def : Pat<(...), (AOp (BOp, ...), AOperand)>
```
PiperOrigin-RevId: 233207225
Previously we were using PatternRewrite::replaceOpWithNewOp() to both create the new op
inline and rewrite the matched op. That does not work well if we want to generate multiple
ops in a sequence. To support that, this CL changed to assign each newly created op to a
separate variable.
This CL also refactors how PatternEmitter performs the directive dispatch logic.
PiperOrigin-RevId: 233206819
Add support for converting `memref_cast` operations into the LLVM IR dialect.
This goes beyond want is currently implemented in the MLIR standard ops to LLVM
IR translation, but follows the general principles of the memref descriptors.
A memref cast creates a new descriptor containing the same buffer pointer but a
potentially different number of dynamic sizes (as many as dynamic dimensions in
the target memref type). The lowering copies the buffer pointer to the new
descriptor and inserts dynamic sizes to it. If the size is static in the
source type, a constant value is inserted as the dynamic size, otherwise a
dynamic value is copied from the source descriptor, taking into account the
difference in dynamic size positions in the descriptor.
PiperOrigin-RevId: 233082035
Make sure the module is always passed to the optimization layer.
Drop unused default argument for the IR transformation and remove the function
that was only used in this default argument. The transformation wrapper
constructor already checks for the null function, so the caller can just pass
`{}` if they don't want any transformation (no callers currently need this).
PiperOrigin-RevId: 233068817
Implement the lowering of memref load and store standard operations into the
LLVM IR dialect. This largely follows the existing mechanism in
MLIR-to-LLVM-IR translation for the sake of compatibility. A memref value is
transformed into a memref descriptor value which holds the pointer to the
underlying data buffer and the dynamic memref sizes. The data buffer is
contiguous. Accesses to multidimensional memrefs are linearized in row-major
form. In linear address computation, statically known sizes are used as
constants while dynamic sizes are extracted from the memref descriptor.
PiperOrigin-RevId: 233043846
That allows TensorFlow Add and Div ops to use Broadcastable op trait instead of
more restrictive SameValueType op trait.
That in turn allows TensorFlow ops to be registered by defining GET_OP_LIST and
including the generated ops file. Currently, tf-raise-control-flow pass tests
are using dynamic shapes in tf.Add op and AddOp can't be registered without
supporting the dynamic shapes.
TESTED with unit tests
PiperOrigin-RevId: 232927998