Summary:
This revision adds a tool that generates the ODS and C++ implementation for "named" Linalg ops according to the [RFC discussion](https://llvm.discourse.group/t/rfc-declarative-named-ops-in-the-linalg-dialect/745).
While the mechanisms and language aspects are by no means set in stone, this revision allows connecting the pieces end-to-end from a mathematical-like specification.
Some implementation details and short-term decisions taken for the purpose of bootstrapping and that are not set in stone include:
1. using a "[Tensor Comprehension](https://arxiv.org/abs/1802.04730)-inspired" syntax
2. implicit and eager discovery of dims and symbols when parsing
3. using EDSC ops to specify the computation (e.g. std_addf, std_mul_f, ...)
A followup revision will connect this tool to tablegen mechanisms and allow the emission of named Linalg ops that automatically lower to various loop forms and run end to end.
For the following "Tensor Comprehension-inspired" string:
```
def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
}
```
With -gen-ods-decl=1, this emits (modulo formatting):
```
def batch_matmulOp : LinalgNamedStructured_Op<"batch_matmul", [
NInputs<2>,
NOutputs<1>,
NamedStructuredOpTraits]> {
let arguments = (ins Variadic<LinalgOperand>:$views);
let results = (outs Variadic<AnyRankedTensor>:$output_tensors);
let extraClassDeclaration = [{
llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
void regionBuilder(ArrayRef<BlockArgument> args);
}];
let hasFolder = 1;
}
```
With -gen-ods-impl, this emits (modulo formatting):
```
llvm::Optional<SmallVector<StringRef, 8>> batch_matmul::referenceIterators() {
return SmallVector<StringRef, 8>{ getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getReductionIteratorTypeName() };
}
llvm::Optional<SmallVector<AffineMap, 8>> batch_matmul::referenceIndexingMaps()
{
MLIRContext *context = getContext();
AffineExpr d0, d1, d2, d3;
bindDims(context, d0, d1, d2, d3);
return SmallVector<AffineMap, 8>{
AffineMap::get(4, 0, {d0, d1, d3}),
AffineMap::get(4, 0, {d3, d2}),
AffineMap::get(4, 0, {d0, d1, d2}) };
}
void batch_matmul::regionBuilder(ArrayRef<BlockArgument> args) {
using namespace edsc;
using namespace intrinsics;
ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);
ValueHandle _4 = std_mulf(_0, _1);
ValueHandle _5 = std_addf(_2, _4);
(linalg_yield(ValueRange{ _5 }));
}
```
Differential Revision: https://reviews.llvm.org/D77067
Summary:
Looks like a refactor that was never completed.
This change removes some unused and ambiguous definitions.
Reviewed By: bondhugula, nicolasvasilache, rriddle
Differential Revision: https://reviews.llvm.org/D75586
Add one more simplification for floordiv and mod affine expressions.
Examples:
(2*d0 + 1) floordiv 2 is simplified to d0
(8*d0 + 4*d1 + d2) floordiv 4 simplified to 4*d0 + d1 + d2 floordiv 4.
etc.
Similarly, (4*d1 + 1) mod 2 is simplified to 1,
(2*d0 + 8*d1) mod 8 simplified to 2*d0 mod 8.
Change getLargestKnownDivisor to return int64_t to be consistent and
to avoid casting at call sites (since the return value is used in expressions
of int64_t/index type).
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#202
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/202 from bondhugula:affine b13fcb2f1c00a39ca5434613a02408e085a80e77
PiperOrigin-RevId: 284866710
- fix store to load forwarding for a certain set of cases (where
forwarding shouldn't have happened); use AffineValueMap difference
based MemRefAccess equality checking; utility logic is also greatly
simplified
- add missing equality/inequality operators for AffineExpr ==/!= ints
- add == != operators on MemRefAccess
Closestensorflow/mlir#136
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/136 from bondhugula:store-load-forwarding d79fd1add8bcfbd9fa71d841a6a9905340dcd792
PiperOrigin-RevId: 270457011
- fix missing check while simplifying an expression with floordiv to a
mod
- fixes issue tensorflow/mlir#82
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#84
PiperOrigin-RevId: 264338353
This CL refactors tiling to enable tiling of views that are not just specified by a simple permutation. This allows the tiling of convolutions for which a new example is added.
PiperOrigin-RevId: 256346028
Affine expressions are designed as components of an attribute and are unique'd
in the MLIRContext. When affine expressions were implemented, uniqu'ing
objects in a context required to modify MLIRContext implementation. This is no
longer the case as generic StorageUniquer has been introduced. Port the
AffineExpr construction to use the new infrastructure by introducing an
affineUniquer into the MLIRContext.
--
PiperOrigin-RevId: 249207539
Analysis - NFC
- refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it
doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints
could be moved out of IR/; the simplification that the IR needs for
AffineExpr's doesn't depend on FlatAffineConstraints
- have AffineExprFlattener derive from SimpleAffineExprFlattener to use for
all Analysis/Transforms purposes; override addLocalFloorDivId in the derived
class
- turn addAffineForOpDomain into a method on FlatAffineConstraints
- turn AffineForOp::getAsValueMap into an AffineValueMap ctor
PiperOrigin-RevId: 235283610
- compute slices precisely where the destination iteration depends on multiple source
iterations (instead of over-approximating to the whole source loop extent)
- update unionBoundingBox to deal with input with non-matching symbols
- reenable disabled backend test case
PiperOrigin-RevId: 234714069
* AffineStructures has moved to IR.
* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.
* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.
* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.
PiperOrigin-RevId: 232586468
This CL is the 1st on the path to simplifying AffineMap composition.
This CL uses the now accepted AffineExpr.replaceDimsAndSymbols to
implement `AffineExpr::compose(AffineMap)`.
Arguably, `simplifyAffineExpr` should be part of IR and not Analysis but
this CL does not yet pull the trigger on that.
PiperOrigin-RevId: 228265845
getAffineBinaryOpExpr for consistency (NFC)
- this is consistent with the name of the class and getAffineDimExpr/ConstantExpr, etc.
PiperOrigin-RevId: 228164959
Supervectorization does not plan on handling multi-result AffineMaps and
non-canonical chains of > 1 AffineApplyOp.
This CL introduces a simpler abstraction and composition of single-result
unbounded AffineApplyOp by using the existing unbound AffineMap composition.
This CL adds a simple API call and relevant tests:
```c++
OpPointer<AffineApplyOp> makeNormalizedAffineApply(
FuncBuilder *b, Location loc, AffineMap map, ArrayRef<Value*> operands);
```
which creates a single-result unbounded AffineApplyOp.
The operands of AffineApplyOp are not themselves results of AffineApplyOp by
consrtuction.
This represent the simplest possible interface to complement the composition
of (mathematical) AffineMap, for the cases when we are interested in applying
it to Value*.
In this CL the composed AffineMap is not compressed (i.e. there exist operands
that are not part of the result). A followup commit will compress to normal
form.
The single-result unbounded AffineApplyOp abstraction will be used in a
followup CL to support the MaterializeVectors pass.
PiperOrigin-RevId: 227879021
symbols.
Included with this is some other infra:
- Testcases for other canonicalizations that I will implement next.
- Some helpers in AffineMap/Expr for doing simple walks without defining whole
visitor classes.
- A 'replaceDimsAndSymbols' facility that I'll be using to simplify maps and
exprs, e.g. to fold one constant into a mapping and to drop/renumber unused dims.
- Allow index (and everything else) to work in memref's, as we previously
discussed, to make the testcase easier to write.
- A "getAffineBinaryExpr" helper to produce a binop when you know the kind as
an enum.
This line of work will eventually subsume the ComposeAffineApply pass, but it is no where close to that yet :-)
PiperOrigin-RevId: 227852951
This CL adds the following free functions:
```
/// Returns the AffineExpr e o m.
AffineExpr compose(AffineExpr e, AffineMap m);
/// Returns the AffineExpr f o g.
AffineMap compose(AffineMap f, AffineMap g);
```
This addresses the issue that AffineMap composition is only available at a
distance via AffineValueMap and is thus unusable on Attributes.
This CL thus implements AffineMap composition in a more modular and composable
way.
This CL does not claim that it can be a good replacement for the
implementation in AffineValueMap, in particular it does not support bounded
maps atm.
Standalone tests are added that replicate some of the logic of the AffineMap
composition pass.
Lastly, affine map composition is used properly inside MaterializeVectors and
a standalone test is added that requires permutation_map composition with a
projection map.
PiperOrigin-RevId: 224376870
This CL implements a very simple loop vectorization **test** and the basic
infrastructure to support it.
The test simply consists in:
1. matching the loops in the MLFunction and all the Load/Store operations
nested under the loop;
2. testing whether all the Load/Store are contiguous along the innermost
memory dimension along that particular loop. If any reference is
non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then
the loop is not-vectorizable.
The simple test above can gradually be extended with more interesting
behaviors to account for the fact that a layout permutation may exist that
enables contiguity etc. All these will come in due time but it is worthwhile
noting that the test already supports detection of outer-vetorizable loops.
In implementing this test, I also added a recursive MLFunctionMatcher and some
sugar that can capture patterns
such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating
on the matched IR structures. For now it just uses in order traversal but
post-order DFS will be useful in the future once IR rewrites start occuring.
One may note that the memory management design decision follows a different
pattern from MLIR. After evaluating different designs and how they quickly
increase cognitive overhead, I decided to opt for the simplest solution in my
view: a class-wide (threadsafe) RAII context.
This way, a pass that needs MLFunctionMatcher can just have its own locally
scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed.
If passes are expected to have a longer lifetime, then the contexts can easily
be scoped inside the runOnMLFunction call and storage lifetime reduced.
Lastly, whatever the scope of threading (module, function, pass), this is
expected to also be future-proof wrt concurrency (but this is a detail atm).
PiperOrigin-RevId: 217622889
This CL sketches what it takes for AffineExpr to fully have by-value semantics
and not be a not-so-smart pointer anymore.
This essentially makes the underyling class a simple storage struct and
implements the operations on the value type directly. Since there is no
forwarding of operations anymore, we can full isolate the storage class and
make a hard visibility barrier by moving detail::AffineExpr into
AffineExprDetail.h.
AffineExprDetail.h is only included where storage-related information is
needed.
PiperOrigin-RevId: 216385459
This CL:
1. performs the global codemod AffineXExpr->AffineXExprClass and
AffineXExprRef -> AffineXExpr;
2. simplifies function calls by removing the redundant MLIRContext parameter;
3. adds missing binary operator versions of scalar op AffineExpr where it
makes sense.
PiperOrigin-RevId: 216242674
This CL introduces a series of cleanups for AffineExpr value types:
1. to make it clear that the value types should be used, the pointer
AffineExpr types are put in the detail namespace. Unfortunately, since the
value type operator-> only forwards to the underlying pointer type, we
still
need to expose this in the include file for now;
2. AffineExprKind is ok to use, it thus comes out of detail and thus of
AffineExpr
3. getAffineDimExpr, getAffineSymbolExpr, getAffineConstantExpr are
similarly
extracted as free functions and their naming is mande consistent across
Builder, MLContext and AffineExpr
4. AffineBinaryOpEx::simplify functions are made into static free
functions.
In particular it is moved away from AffineMap.cpp where it does not belong
5. operator AffineExprType is made explicit
6. uses the binary operators everywhere possible
7. drops the pointer usage everywhere outside of AffineExpr.cpp,
MLIRContext.cpp and AsmPrinter.cpp
PiperOrigin-RevId: 216207212
This CL makes AffineExprRef into a value type.
Notably:
1. drops llvm isa, cast, dyn_cast on pointer type and uses member functions on
the value type. It may be possible to still use classof (in a followup CL)
2. AffineBaseExprRef aggressively casts constness away: if we mean the type is
immutable then let's jump in with both feet;
3. Drop implicit casts to the underlying pointer type because that always
results in surprising behavior and is not needed in practice once enough
cleanup has been applied.
The remaining negative I see is that we still need to mix operator. and
operator->. There is an ugly solution that forwards the methods but that ends
up duplicating the class hierarchy which I tried to avoid as much as
possible. But maybe it's not that bad anymore since AffineExpr.h would still
contain a single class hierarchy (the duplication would be impl detail in.cpp)
PiperOrigin-RevId: 216188003
This CL starts by replacing AffineExpr* with value-type AffineExprRef in a few
places in the IR. By a domino effect that is pretty telling of the
inconsistencies in the codebase, const is removed where it makes sense.
The rationale is that the decision was concisously made that unique'd types
have pointer semantics without const specifier. This is fine but we should be
consistent. In the end, the only logical invariant is that there should never
be such a thing as a const AffineExpr*, const AffineMap* or const IntegerSet*
in our codebase.
This CL takes a number of shortcuts to killing const with fire, in particular
forcing const AffineExprRef to return the underlying non-const
AffineExpr*. This will be removed once AffineExpr* has disappeared in
containers but for now such shortcuts allow a bit of sanity in this long quest
for cleanups.
The **only** places where const AffineExpr*, const AffineMap* or const
IntegerSet* may still appear is by transitive needs from containers,
comparison operators etc.
There is still one major thing remaining here: figure out why cast/dyn_cast
return me a const AffineXXX*, which in turn requires a bunch of ugly
const_casts. I suspect this is due to the classof
taking const AffineXXXExpr*. I wonder whether this is a side effect of 1., if
it is coming from llvm itself (I'd doubt it) or something else (clattner@?)
In light of this, the whole discussion about const makes total sense to me now
and I would systematically apply the rule that in the end, we should never
have any const XXX in our codebase for unique'd types (assuming we can remove
them all in containers and no additional constness constraint is added on us
from the outside world).
PiperOrigin-RevId: 215811554
This CL implements AffineExprBaseRef as a templated type to allow LLVM-style
casts to work properly. This also allows making AffineExprBaseRef::expr
private.
To achieve this, it is necessary to use llvm::simplify_type and make
AffineConstExpr derive from both AffineExpr and llvm::simplify<AffineExprRef>.
Note that llvm::simplify_type is just an interface to enable the proper
template resolution of isa/cast/dyn_cast but it otherwise holds no value.
Lastly note that certain dyn_cast operations wanted the const AffineExpr* form
of AffineExprBaseRef so I made the implicit constructor take that by default
and documented the immutable behavior. I think this is consistent with the
decision to make unique'd type immutable by convention and never use const on
them.
PiperOrigin-RevId: 215642247
This CL uniformizes the uses of AffineExprWrap outside of IR.
The public API of AffineExpr builder is modified to only use AffineExprWrap.
A few places access AffineExprWrap.expr, this is only while the API is in
transition to easily keep track (i.e. make expr private and let the compiler
track the errors).
Parser.cpp exhibits patterns that are dependent on nullptr values so
converting it is left for another CL.
PiperOrigin-RevId: 215642005
This CL proposes adding MLIRContext* to AffineExpr as discussed previously.
This allows the value class to not require the context in its constructor and
makes it a POD that it makes sense to pass by value everywhere.
A list of other RFC CLs will build on this. The RFC CLs are small incremental
pushes of the API which would be a pretty big change otherwise.
Pushing the thinking a little bit more it seems reasonable to use implicit
cast/constructor to/from AffineExpr*.
As this thing evolves, it looks to me like IR (and
probably Parser, for not so good reasons) want to operate on AffineExpr* and
the rest of the code wants to operate on the value type.
For this reason I think AffineExprImpl*/AffineExpr may also make sense but I
do not have a particular naming preference.
The jury is still out for naming decision between the above and
AffineExprBase*/AffineExpr or AffineExpr*/AffineExprRef.
PiperOrigin-RevId: 215641596
This CL argues that the builder API for AffineExpr should be used
with a lightweight wrapper that supports operators chaining.
This CL takes the ill-named AffineExprWrap and proposes a simple
set of operators with builtin constant simplifications.
This allows:
1. removing the getAddMulPureAffineExpr function;
2. avoiding concerns about constant vs non-constant simplifications
at **every call site**;
3. writing the mathematical expressions we want to write without unnecessary
obfuscations.
The points above represent pure technical debt that we don't want to carry on.
It is important to realize that this is not a mere convenience or "just sugar"
but reduction in cognitive overhead.
This thinking can be pushed significantly further, I have added some comments
with some basic ideas but we could make AffineMap, AffineApply and other
objects that use map applications more functional and value-based.
I am putting this out to get a first batch of reviews and see what people
think.
I think in my preferred design I would have the Builder directly return such
AffineExprPtr objects by value everywhere and avoid the boilerplate explicit
creations that I am doing by hand at this point.
Yes this AffineExprPtr would implicitly convert to AffineExpr* because that is
what it is.
PiperOrigin-RevId: 215641317
Use these methods to simplify existing code. Rename getConstantMap
getConstantAffineMap. Move declarations to group similar ones together.
PiperOrigin-RevId: 212814829
unroll/unroll-and-jam more powerful; add additional affine expr builder methods
- use previously added analysis/simplification to infer multiple of unroll
factor trip counts, making loop unroll/unroll-and-jam more general.
- for loop unroll, support bounds that are single result affine map's with the
same set of operands. For unknown loop bounds, loop unroll will now work as
long as trip count can be determined to be a multiple of unroll factor.
- extend getConstantTripCount to deal with single result affine map's with the
same operands. move it to mlir/Analysis/LoopAnalysis.cpp
- add additional builder utility methods for affine expr arithmetic
(difference, mod/floordiv/ceildiv w.r.t postitive constant). simplify code to
use the utility methods.
- move affine analysis routines to AffineAnalysis.cpp/.h from
AffineStructures.cpp/.h.
- Rename LoopUnrollJam to LoopUnrollAndJam to match class name.
- add an additional simplification for simplifyFloorDiv, simplifyCeilDiv
- Rename AffineMap::getNumOperands() getNumInputs: an affine map by itself does
not have operands. Operands are passed to it through affine_apply, from loop
bounds/if condition's, etc., operands are stored in the latter.
This should be sufficiently powerful for now as far as unroll/unroll-and-jam go for TPU
code generation, and can move to other analyses/transformations.
Loop nests like these are now unrolled without any cleanup loop being generated.
for %i = 1 to 100 {
// unroll factor 4: no cleanup loop will be generated.
for %j = (d0) -> (d0) (%i) to (d0) -> (5*d0 + 3) (%i) {
%x = "foo"(%j) : (affineint) -> i32
}
}
for %i = 1 to 100 {
// unroll factor 4: no cleanup loop will be generated.
for %j = (d0) -> (d0) (%i) to (d0) -> (d0 - d mod 4 - 1) (%i) {
%y = "foo"(%j) : (affineint) -> i32
}
}
for %i = 1 to 100 {
for %j = (d0) -> (d0) (%i) to (d0) -> (d0 + 128) (%i) {
%x = "foo"() : () -> i32
}
}
TODO(bondhugula): extend this to LoopUnrollAndJam as well in the next CL (with minor
changes).
PiperOrigin-RevId: 212661212
Outside of IR/
- simplify a MutableAffineMap by flattening the affine expressions
- add a simplify affine expression pass that uses this analysis
- update the FlatAffineConstraints API (to be used in the next CL)
In IR:
- add isMultipleOf and getKnownGCD for AffineExpr, and make the in-IR
simplication of simplifyMod simpler and more powerful.
- rename the AffineExpr visitor methods to distinguish b/w visiting and
walking, and to simplify API names based on context.
The next CL will use some of these for the loop unrolling/unroll-jam to make
the detection for the need of cleanup loop powerful/non-trivial.
A future CL will finally move this simplification to FlatAffineConstraints to
make it more powerful. For eg., currently, even if a mod expr appearing in a
part of the expression tree can't be simplified, the whole thing won't be
simplified.
PiperOrigin-RevId: 211012256
- Drop sub-classing of affine binary op expressions.
- Drop affine expr op kind sub. Represent it as multiply by -1 and add. This
will also be in line with the math form when we'll need to represent a system of
linear equalities/inequalities: the negative number goes into the coefficient
of an affine form. (For eg. x_1 + (-1)*x_2 + 3*x_3 + (-2) >= 0). The folding
simplification will transparently deal with multiplying the -1 with any other
constants. This also means we won't need to simplify a multiply expression
like in x_1 + (-2)*x_2 to a subtract expression (x_1 - 2*x_2) for
canonicalization/uniquing.
- When we print the IR, we will still pretty print to a subtract when possible.
PiperOrigin-RevId: 205298958
- fold constants when possible.
- for a mul expression, canonicalize to always keep the LHS as the
constant/symbolic term, and similarly, the RHS for an add expression to keep
it closer to the mathematical form. (Eg: f(x) = 3*x + 5)); other similar simplifications;
- verify binary op expressions at creation time.
TODO: we can completely drop AffineSubExpr, and instead use add and mul by -1.
This way something like x - 4 and -4 + x get canonicalized to x + -1 * 4
instead of being x - 4 and x + -4. (The other alternative if wanted to retain
AffineSubExpr would be to simplify x + -1*y to x - y and x + <neg number> to x
- <pos number>).
PiperOrigin-RevId: 204240258
use it.
This also removes "operand" from the affine expr classes: it is unnecessary
verbosity and "operand" will mean something very specific for SSA stuff (we
will have an Operand type).
PiperOrigin-RevId: 203976504
- check for non-affine expressions
- handle negative numbers and negation of id's, expressions
- functions to check if a map is pure affine or semi-affine
- simplify/clean up affine map parsing code
- report more errors messages, more accurate error messages
PiperOrigin-RevId: 203773633
Run test case:
$ mlir-opt test/IR/parser-affine-map.mlir
test/IR/parser-affine-map.mlir:3:30: error: expect '(' at start of map range
#hello_world2 (i, j) [s0] -> i+s0, j)
^
PiperOrigin-RevId: 202736856
- parsing affine map identifiers
- place-holder classes for AffineMap
- module contains a list of affine maps (defined at the top level).
PiperOrigin-RevId: 202336919