This CL applies the following simplifications to EDSCs:
1. Rename Block to StmtList because an MLIR Block is a different, not yet
supported, notion;
2. Rework Bindable to drop specific storage and just use it as a simple wrapper
around Expr. The only value of Bindable is to force a static cast when used by
the user to bind into the emitter. For all intended purposes, Bindable is just
a lightweight check that an Expr is Unbound. This simplifies usage and reduces
the API footprint. After playing with it for some time, it wasn't worth the API
cognition overhead;
3. Replace makeExprs and makeBindables by makeNewExprs and copyExprs which is
more explicit and less easy to misuse;
4. Add generally useful functionality to MLIREmitter:
a. expose zero and one for the ubiquitous common lower bounds and step;
b. add support to create already bound Exprs for all function arguments as
well as shapes and views for Exprs bound to memrefs.
5. Delete Stmt::operator= and replace by a `Stmt::set` method which is more
explicit.
6. Make Stmt::operator Expr() explicit.
7. Indexed.indices assertions are removed to pave the way for expressing slices
and views as well as to work with 0-D memrefs.
The CL plugs those simplifications with TableGen and allows emitting a full MLIR function for
pointwise add.
This "x.add" op is both type and rank-agnostic (by allowing ArrayRef of Expr
passed to For loops) and opens the door to spinning up a composable library of
existing and custom ops that should automate a lot of the tedious work in
TF/XLA -> MLIR.
Testing needs to be significantly improved but can be done in a separate CL.
PiperOrigin-RevId: 231982325
This CL addresses some cleanups that were leftover after an incorrect rebase:
1. use StringSwitch
2. use // NOLINTNEXTLINE
3. remove a dead line of code
PiperOrigin-RevId: 231726640
This CL also introduces a set of python bindings using pybind11. The bindings
are exercised using a `test_py2andpy3.py` test suite that works for both
python 2 and 3.
`test_py3.py` on the other hand uses the more idiomatic,
python 3 only "PEP 3132 -- Extended Iterable Unpacking" to implement a rank
and type-agnostic copy with transposition.
Because python assignment is by reference, we cannot easily make the
assignment operator use the same type of sugaring as in C++; i.e. the
following:
```cpp
Stmt block = edsc::Block({
For(ivs, zeros, shapeA, ones, {
C[ivs] = IA[ivs] + IB[ivs]
})});
```
has no equivalent in the native Python EDSCs at this time.
However, the sugaring can be built as a simple DSL in python and is left as
future work.
PiperOrigin-RevId: 231337667
This CL adds support for calling EDSCs from other languages than C++.
Following the LLVM convention this CL:
1. declares simple opaque types and a C API in mlir-c/Core.h;
2. defines the implementation directly in lib/EDSC/Types.cpp and
lib/EDSC/MLIREmitter.cpp.
Unlike LLVM however the nomenclature for these types and API functions is not
well-defined, naming suggestions are most welcome.
To avoid the need for conversion functions, Types.h and MLIREmitter.h include
mlir-c/Core.h and provide constructors and conversion operators between the
mlir::edsc type and the corresponding C type.
In this first commit, mlir-c/Core.h only contains the types for the C API
to allow EDSCs to work from Python. This includes both a minimal set of core
MLIR
types (mlir_context_t, mlir_type_t, mlir_func_t) as well as the EDSC types
(edsc_mlir_emitter_t, edsc_expr_t, edsc_stmt_t, edsc_indexed_t). This can be
restructured in the future as concrete needs arise.
For now, the API only supports:
1. scalar types;
2. memrefs of scalar types with static or symbolic shapes;
3. functions with input and output of these types.
The C API is not complete wrt ownership semantics. This is in large part due
to the fact that python bindings are written with Pybind11 which allows very
idiomatic C++ bindings. An effort is made to write a large chunk of these
bindings using the C API but some C++isms are used where the design benefits
from this simplication. A fully isolated C API will make more sense once we
also integrate with another language like Swift and have enough use cases to
drive the design.
Lastly, this CL also fixes a bug in mlir::ExecutionEngine were the order of
declaration of llvmContext and the JIT result in an improper order of
destructors (which used to crash before the fix).
PiperOrigin-RevId: 231290250
This CL adds the Return op to EDSCs types and emitter.
This allows generating full function bodies that can be compiled all the way
down to LLVMIR and executed on CPU.
At this point, the MLIR lacks the testing infrastructure to exercise this.
End-to-end testing of full functions written in EDSCs is left for a future CL.
PiperOrigin-RevId: 230527530
- improve/fix doc comments for affine apply composition related methods.
- drop makeSingleValueComposedAffineApply - really redundant and out of line in
a public API; it's just returning the first result of the composed affine
apply op, and not making a single result affine map or an affine_apply op.
PiperOrigin-RevId: 230406169
This CL also makes ScopedEDSCContexts to reset the Bindable numbering when
creating a new context.
This is useful to write minimal tests that don't use FileCheck pattern
captures for now.
PiperOrigin-RevId: 230079997
This CL performs a bunch of cleanups related to EDSCs that are generally
useful in the context of using them with a simple wrapping C API (not in this
CL) and with simple language bindings to Python and Swift.
PiperOrigin-RevId: 230066505
This CL fixes a misunderstanding in how to build DimOp which triggered
execution issues in the CPU path.
The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to
construct the dynamic dimensions should be:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and
`dim %arg, 4 : memref<?x4x?x8x?xf32>`
Before this CL, we wold construct:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 1 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and expect the other dimensions to be constants.
This assumption seems consistent at first glance with the syntax of alloc:
```
%tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32>
```
But this was actuallyincorrect.
This CL also makes the relevant functions available to EDSCs and removes
duplication of the incorrect function.
PiperOrigin-RevId: 229622766
This is mostly plumbing to start allowing testing EDSC lowering. Prototype specifying reference implementation using verbose format without any generation/binding support. Add test pass that dumps the constructed EDSC (of which there can only be one). The idea is to enable iterating from multiple sides, this is wrong on many dimensions at the moment.
PiperOrigin-RevId: 229570535
This allows load, store and ForNest to be used with both Expr and Bindable.
This simplifies writing generic pieces of MLIR snippet.
For instance, a generic pointwise add can now be written:
```cpp
// Different Bindable ivs, one per loop in the loop nest.
auto ivs = makeBindables(shapeA.size());
Bindable zero, one;
// Same bindable, all equal to `zero`.
SmallVector<Bindable, 8> zeros(ivs.size(), zero);
// Same bindable, all equal to `one`.
SmallVector<Bindable, 8> ones(ivs.size(), one);
// clang-format off
Bindable A, B, C;
Stmt scalarA, scalarB, tmp;
Stmt block = edsc::Block({
ForNest(ivs, zeros, shapeA, ones, {
scalarA = load(A, ivs),
scalarB = load(B, ivs),
tmp = scalarA + scalarB,
store(tmp, C, ivs)
}),
});
// clang-format on
```
This CL also adds some extra support for pretty printing that will be used in
a future CL when we introduce standalone testing of EDSCs. At the momen twe
are lacking the basic infrastructure to write such tests.
PiperOrigin-RevId: 229375850
Arguably the dependence of EDSCs on Analysis is not great but on the other
hand this is a strict improvement in the emitted IR and since EDSCs are an
alternative to builders it makes sense that they have as much access to
Analysis as Transforms.
PiperOrigin-RevId: 228967624
- when SSAValue/MLValue existed, code at several places was forced to create additional
aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get
rid of such redundant code
- use filling ctors instead of explicit loops
- for smallvectors, change insert(list.end(), ...) -> append(...
- improve comments at various places
- turn getMemRefAccess into MemRefAccess ctor and drop duplicated
getMemRefAccess. In the next CL, provide getAccess() accessors for load,
store, DMA op's to return a MemRefAccess.
PiperOrigin-RevId: 228243638
This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs)
in MLIR components:
1. a `Type` system of shell classes that closely matches the MLIR type system. These
types are subdivided into `Bindable` leaf expressions and non-bindable `Expr`
expressions;
2. an `MLIREmitter` class whose purpose is to:
a. maintain a map of `Bindable` leaf expressions to concrete SSAValue*;
b. provide helper functionality to specify bindings of `Bindable` classes to
SSAValue* while verifying comformable types;
c. traverse the `Expr` and emit the MLIR.
This is used on a concrete example to implement MemRef load/store with clipping in the
LowerVectorTransfer pass. More specifically, the following pseudo-C++ code:
```c++
MLFuncBuilder *b = ...;
Location location = ...;
Bindable zero, one, expr, size;
// EDSL expression
auto access = select(expr < zero, zero, select(expr < size, expr, size - one));
auto ssaValue = MLIREmitter(b)
.bind(zero, ...)
.bind(one, ...)
.bind(expr, ...)
.bind(size, ...)
.emit(location, access);
```
is used to emit all the MLIR for a clipped MemRef access.
This simple EDSL can easily be extended to more powerful patterns and should
serve as the counterpart to pattern matchers (and could potentially be unified
once we get enough experience).
In the future, most of this code should be TableGen'd but for now it has
concrete valuable uses: make MLIR programmable in a declarative fashion.
This CL also adds Stmt, proper supporting free functions and rewrites
VectorTransferLowering fully using EDSCs.
The code for creating the EDSCs emitting a VectorTransferReadOp as loops
with clipped loads is:
```c++
Stmt block = Block({
tmpAlloc = alloc(tmpMemRefType),
vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
ForNest(ivs, lbs, ubs, steps, {
scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs),
store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs),
}),
vectorValue = load(vectorView, zero),
tmpDealloc = dealloc(tmpAlloc.getLHS())});
emitter.emitStmt(block);
```
where `accessInfo.clippedScalarAccessExprs)` is created with:
```c++
select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one));
```
The generated MLIR resembles:
```mlir
%1 = dim %0, 0 : memref<?x?x?x?xf32>
%2 = dim %0, 1 : memref<?x?x?x?xf32>
%3 = dim %0, 2 : memref<?x?x?x?xf32>
%4 = dim %0, 3 : memref<?x?x?x?xf32>
%5 = alloc() : memref<5x4x3xf32>
%6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
for %i4 = 0 to 3 {
for %i5 = 0 to 4 {
for %i6 = 0 to 5 {
%7 = affine_apply #map0(%i0, %i4)
%8 = cmpi "slt", %7, %c0 : index
%9 = affine_apply #map0(%i0, %i4)
%10 = cmpi "slt", %9, %1 : index
%11 = affine_apply #map0(%i0, %i4)
%12 = affine_apply #map1(%1, %c1)
%13 = select %10, %11, %12 : index
%14 = select %8, %c0, %13 : index
%15 = affine_apply #map0(%i3, %i6)
%16 = cmpi "slt", %15, %c0 : index
%17 = affine_apply #map0(%i3, %i6)
%18 = cmpi "slt", %17, %4 : index
%19 = affine_apply #map0(%i3, %i6)
%20 = affine_apply #map1(%4, %c1)
%21 = select %18, %19, %20 : index
%22 = select %16, %c0, %21 : index
%23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32>
store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32>
}
}
}
%24 = load %6[%c0] : memref<1xvector<5x4x3xf32>>
dealloc %5 : memref<5x4x3xf32>
```
In particular notice that only 3 out of the 4-d accesses are clipped: this
corresponds indeed to the number of dimensions in the super-vector.
This CL also addresses the cleanups resulting from the review of the prevous
CL and performs some refactoring to simplify the abstraction.
PiperOrigin-RevId: 227367414