* before/after pass execution
* after a pass fails
* before/after an analysis is computed
After getting this infrastructure in place, we can start providing common developer utilities like pass timing, IR printing after pass execution, etc.
PiperOrigin-RevId: 237709692
Declarative builders want to provide the same nesting interface for blocks and loops. MLIR on the other hand has different behaviors:
1. when an AffineForOp is created the insertion point does not enter the loop body;
2. when a Block is created, the insertion point does enter the block body.
Guard against the second behavior in EDSC to make the interface unsurprising.
This also surfaces two places in the eager branch API where I was guarding against this behavior indirectly by creating a new ScopedContext.
Instead, uniformize everything to properly reset the insertion point in the unique place that builds the mlir::Block*.
PiperOrigin-RevId: 237619513
This CL addresses a few post-submit comments:
1. better comments,
2. check number of results before dyn_cast (which is a less common case)
3. test usage for multi-result InstructionHandle
PiperOrigin-RevId: 237549333
This CL adds support for named custom instructions in declarative builders.
To allow this, it introduces a templated `CustomInstruction` class.
This CL also splits ValueHandle which can capture only the **value** in single-valued instructions from InstructionHandle which can capture any instruction but provide no typing and sugaring to extract the potential Value*.
PiperOrigin-RevId: 237543222
There are two ways that we can attach a name to a DAG node:
1) (Op:$name ...)
2) (Op ...):$name
The problem with 2) is that we cannot do it on the outmost DAG node in a tree.
Switch from 2) to 1).
PiperOrigin-RevId: 237513962
This CL added the ability to generate multiple ops using multiple result
patterns, with each of them replacing one result of the matched source op.
Specifically, the syntax is
```
def : Pattern<(SourceOp ...),
[(ResultOp1 ...), (ResultOp2 ...), (ResultOp3 ...)]>;
```
Assuming `SourceOp` has three results.
Currently we require that each result op must generate one result, which
can be lifted later when use cases arise.
To help with cases that certain output is unused and we don't care about it,
this CL also introduces a new directive: `verifyUnusedValue`. Checks will
be emitted in the `match()` method to make sure if the corresponding output
is not unused, `match()` returns with `matchFailure()`.
PiperOrigin-RevId: 237513904
TensorFlow does not allow integers of random bitwidths. It only accepts 8-,
16-, 32-, and 64-bit integer types. Similarly for floating point types, only
half, single, double, and bfloat16 types.
PiperOrigin-RevId: 237483913
- compute tile sizes based on a simple model that looks at memory footprints
(instead of using the hardcoded default value)
- adjust tile sizes to make them factors of trip counts based on an option
- update loop fusion CL options to allow setting maximal fusion at pass creation
- change an emitError to emitWarning (since it's not a hard error unless the client
treats it that way, in which case, it can emit one)
$ mlir-opt -debug-only=loop-tile -loop-tile test/Transforms/loop-tiling.mlir
test/Transforms/loop-tiling.mlir:81:3: note: using tile sizes [4 4 5 ]
for %i = 0 to 256 {
for %i0 = 0 to 256 step 4 {
for %i1 = 0 to 256 step 4 {
for %i2 = 0 to 250 step 5 {
for %i3 = #map4(%i0) to #map11(%i0) {
for %i4 = #map4(%i1) to #map11(%i1) {
for %i5 = #map4(%i2) to #map12(%i2) {
%0 = load %arg0[%i3, %i5] : memref<8x8xvector<64xf32>>
%1 = load %arg1[%i5, %i4] : memref<8x8xvector<64xf32>>
%2 = load %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
%3 = mulf %0, %1 : vector<64xf32>
%4 = addf %2, %3 : vector<64xf32>
store %4, %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
}
}
}
}
}
}
PiperOrigin-RevId: 237461836
Recently, EDSC introduced an eager mode for building IR in different contexts.
Introduce Python bindings support for loop and loop nest contexts of EDSC
builders. The eager mode is built around the notion of ValueHandle, which is
convenience class for delayed initialization and operator overloads. Expose
this class and overloads directly. The model of insertion contexts maps
naturally to Python context manager mechanism, therefore new bindings are
defined bypassing the C APIs. The bindings now provide three new context
manager classes: FunctionContext, LoopContext and LoopNestContext. The last
two can be used with the `with`-construct in Python to create loop (nests) and
obtain handles to the loop induction variables seamlessly:
with LoopContext(lhs, rhs, 1) as i:
lhs + rhs + i
with LoopContext(rhs, rhs + rhs, 2) as j:
x = i + j
Any statement within the Python context will trigger immediate emission of the
corresponding IR constructs into the context owned by the nearest context
manager.
PiperOrigin-RevId: 237447732
The first version of TableGen-defined LLVM IR Dialect did not include the
mandatory or optional attributes of the operations due to the missing support
for some of the relevant attribute types. This support has been recently
introduced, along with named attributes as arguments in the TableGen operation
definitions. With these changes, LLVM IR Dialect operations now have factory
functions accepting (unnamed) attributes and attaching their canonical names.
Use these factories instead of manually constructing named attributes in the
dialect convreter to avoid hardcoded attribute names in unexpected places.
PiperOrigin-RevId: 237237769
These cleanups reflects some recent changes to the LLVM IR Dialect and the
infrastructure that affects it. In particular, add documentation on direct and
indirect function calls as well as remove the `call` and `call0` separation.
Change the prefix of custom types from `!llvm.type` to `!llvm` so that it
matches the IR. Remove the verifier check disallowing conditional branches to
the same block with arguments: identical arguments are now supported, and
different arguments will be caught later.
PiperOrigin-RevId: 237203452
The LLVM IR Dialect strives to be close to the original LLVM IR instructions.
The conversion from the LLVM IR Dialect to LLVM IR proper is mostly mechanical
and can be automated. Implement TableGen support for generating conversions
from a concise pattern form in the TableGen definition of the LLVM IR Dialect
operations. It is used for all operations except calls and branches. These
operations need access to function and block remapping tables and would require
significantly more code to generate the conversions from TableGen definitions
than the current manually written conversions.
This implementation is accompanied by various necessary changes to the TableGen
operation definition infrastructure. In particular, operation definitions now
contain named accessors to results as well as named accessors to the variadic
operand (returning a vector of operands). The base operation support TableGen
file now contains a FunctionAttr definition. The TableGen now allows to query
the names of the operation results.
PiperOrigin-RevId: 237203077
* bool succeeded(Status)
- Return if the status corresponds to a success value.
* bool failed(Status)
- Return if the status corresponds to a failure value.
PiperOrigin-RevId: 237153884
Supports use case where FlatAffineConstraints::composeMap adds dim identifiers with no SSA values (because the identifiers are the result of an AffineValueMap which is not materialized in the IR and thus has no SSA Value results).
PiperOrigin-RevId: 237145506
This CL adds the same helper classes that exist in the AST form of EDSCs to support a basic indexing notation and emit the proper load and store operations and capture MemRefViews as function arguments.
This CL also adds a wrapper class LoopNestBuilder to allow generic rank-agnostic loops over indices.
PiperOrigin-RevId: 237113755
An implicit OpPointer -> OpType* conversion results in AddressSanitizer triggering a stack-use-after-scope error (which may be a false positive).
Avoid using such patterns to make life good again.
PiperOrigin-RevId: 237053863
Currently, Python bindings provide support for declarting and defining MLIR
functions given a list of argument and result types. Extend the support for
both function declaration and function definition to handle optional function
attributes and function argument attributes. Function attributes are exposed
as keyword arguments on function declaration and definition calls. Function
argument attributes are exposed through a special object that combines the
argument type and its list of attributes. Such objects can be passed instead
of bare types into the type declaration and definition calls. They can be
constructed from bare types and reused in different declarations.
Note that, from the beginning, Python bindings did not pass through C bindings
to declare and define functions. This commit keeps the direct interaction
between Python and C++.
PiperOrigin-RevId: 237047561
When building unstructured control-flow there is a need to construct mlir::Block* before being able to fill them. This invites goto-style programming.
This CL introduces an alternative eager API for BR and COND_BR in which blocks are created eagerly and captured on the fly.
This allows reducing the number of calls to `BlockBuilder` from 4 to 2 in the `builder_blocks_eager` test and from 3 to 2 in the `builder_cond_branch_eager` test.
PiperOrigin-RevId: 237046114
This CL adds support for BranchHandle and BranchBuilder that require a slightly different
abstraction since an mlir::Block is not an mlir::Value.
This CL also adds support for the BR and COND_BR instructions and the relevant tests.
PiperOrigin-RevId: 237034312
This CL reworks the design of EDSCs from first principles.
It introduces a ValueHandle which can hold either:
1. an eagerly typed, delayed Value*
2. a precomputed Value*
A ValueHandle can be manipulated with intrinsic operations a nested within a NestedBuilder. These NestedBuilder are a more idiomatic nested builder abstraction that should feel intuitive to program in C++.
Notably, this abstraction does not require an AST to stage the construction of MLIR snippets from C++. Instead, the abstraction makes use of orderings between definition and declaration of ValueHandles and provides a NestedBuilder and a LoopBuilder helper classes to handle those orderings.
All instruction creations are meant to use the templated ValueHandle::create<> which directly calls mlir::Builder.create<>.
For now the EDSC AST and the builders live side-by-side until the C API is ported.
PiperOrigin-RevId: 237030945
The existing implementation of the Op definition generator assumes and relies
on the fact that native Op Attributes appear after its value-based operands in
the Arguments list. Furthermore, the same order is used in the generated
`build` function for the operation. This is not desirable for some operations
with mandatory attributes that would want the attribute to appear upfront for
better consistency with their textual representation, for example `cmpi` would
prefer the `predicate` attribute to be foremost in the argument list.
Introduce support for using attributes and operands in the Arguments DAG in no
particular order. This is achieved by maintaining a list of Arguments that
point to either the value or the attribute and are used to generate the `build`
method.
PiperOrigin-RevId: 237002921
Adds utility to convert slice bounds to a FlatAffineConstraints representation.
Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers.
PiperOrigin-RevId: 236973261
- change this for consistency - everything else similar takes/returns a
Function pointer - the FuncBuilder ctor,
Block/Value/Instruction::getFunction(), etc.
- saves a whole bunch of &s everywhere
PiperOrigin-RevId: 236928761
- fix for the mod detection
- simplify/avoid the mod at construction (if the dividend is already known to be less
than the divisor), since the information is available at hand there
PiperOrigin-RevId: 236882988
The recently introduced support for generating MLIR Operations with optional
attributes did not handle the formatted string emission properly, in particular
it did not escape `{` and `}` in calls to `formatv` leading to assertions
during TableGen op definition generation. Fix this by splitting out the
unncessary braces from the format string. Additionally, fix the emission of
the builder argument comment to correctly indicate which attributes are indeed
optional and which are not.
PiperOrigin-RevId: 236832230
- fix out of bounds test case
- -memref-bound-check on the test/Transforms/loop-fusion.mlir no longer reports any
errors, before or after -loop-fusion is run
PiperOrigin-RevId: 236757658
- this was detected when memref-bound-check was run on the output of the
loop-fusion pass
- the addition (to represent ceildiv as a floordiv) had to be performed only
for the constant term of the constraint
- update test cases
- memref-bound-check no longer returns an error on the output of this test case
PiperOrigin-RevId: 236731137
This fixes a bug: previously, during conversion function argument
attributes were neither beings passed through nor converted. This fix
extends DialectConversion to allow for simultaneous conversion of the
function type and the argument attributes.
This was important when lowering MLIR to LLVM where attribute
information (e.g. noalias) needs to be preserved in MLIR(LLVMDialect).
Longer run it seems reasonable that we want to convert both the
function attribute and its type and the argument attributes, but that
requires a small refactoring in Function.h to aggregate these three
fields in an inner struct, which will require some discussion.
PiperOrigin-RevId: 236709409
Dialect attributes are defined as:
dialect-namespace `.` attr-name `:` attribute-value
Dialects can override any of the following hooks to verify the validity of a given attribute:
* verifyFunctionAttribute
* verifyFunctionArgAttribute
* verifyInstructionAttribute
PiperOrigin-RevId: 236507970
An analysis can be any class, but it must provide the following:
* A constructor for a given IR unit.
struct MyAnalysis {
// Compute this analysis with the provided module.
MyAnalysis(Module *module);
};
Analyses can be accessed from a Pass by calling either the 'getAnalysisResult<AnalysisT>' or 'getCachedAnalysisResult<AnalysisT>' methods. A FunctionPass may query for a cached analysis on the parent module with 'getCachedModuleAnalysisResult'. Similary, a ModulePass may query an analysis, it doesn't need to be cached, on a child function with 'getFunctionAnalysisResult'.
By default, when running a pass all cached analyses are set to be invalidated. If no transformation was performed, a pass can use the method 'markAllAnalysesPreserved' to preserve all analysis results. As noted above, preserving specific analyses is not yet supported.
PiperOrigin-RevId: 236505642
- This change only impacts the cost model for fusion, given the way
addSliceBounds was being used. It so happens that the output in spite of this
CL's fix is the same; however, the assertions added no longer fail. (an
invalid/inconsistent memref region was being used earlier).
PiperOrigin-RevId: 236405030
This CL changes dialect op source files (.h, .cpp, .td) to follow the following
convention:
<full-dialect-name>/<dialect-namespace>Ops.{h|cpp|td}
Builtin and standard dialects are specially treated, though. Both of them do
not have dialect namespace; the former is still named as BuiltinOps.* and the
latter is named as Ops.*.
Purely mechanical. NFC.
PiperOrigin-RevId: 236371358
- detect all parallel loops based on dep information and mark them with a
"parallel" attribute
- add mlir::isLoopParallel(OpPointer<AffineForOp> ...), and refactor an existing method
to use that (reuse some code from @andydavis (cl/236007073) for this)
- a simple/meaningful way to test memref dep test as well
Ex:
$ mlir-opt -detect-parallel test/Transforms/parallelism-detection.mlir
#map1 = ()[s0] -> (s0)
func @foo(%arg0: index) {
%0 = alloc() : memref<1024x1024xvector<64xf32>>
%1 = alloc() : memref<1024x1024xvector<64xf32>>
%2 = alloc() : memref<1024x1024xvector<64xf32>>
for %i0 = 0 to %arg0 {
for %i1 = 0 to %arg0 {
for %i2 = 0 to %arg0 {
%3 = load %0[%i0, %i2] : memref<1024x1024xvector<64xf32>>
%4 = load %1[%i2, %i1] : memref<1024x1024xvector<64xf32>>
%5 = load %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
%6 = mulf %3, %4 : vector<64xf32>
%7 = addf %5, %6 : vector<64xf32>
store %7, %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
} {parallel: false}
} {parallel: true}
} {parallel: true}
return
}
PiperOrigin-RevId: 236367368
*) Breaks fusion pass into multiple sub passes over nodes in data dependence graph:
- first pass fuses single-use producers into their unique consumer.
- second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges.
- third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer).
*) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved.
*) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation.
*) Adds a generic loop utilitiy for finding all sequential loops in a loop nest.
*) Adds and updates unit tests.
PiperOrigin-RevId: 236350987
- handle floordiv/mod's in loop bounds for all analysis purposes
- allows fusion slicing to be more powerful
- add simple test cases based on -memref-bound-check
- fusion based test cases in follow up CLs
PiperOrigin-RevId: 236328551
- add a method to merge and align the spaces (identifiers) of two
FlatAffineConstraints (both get dimension-wise and symbol-wise unique
columns)
- this completes several TODOs, gets rid of previous assumptions/restrictions
in composeMap, unionBoundingBox, and reuses common code
- remove previous workarounds / duplicated funcitonality in
FlatAffineConstraints::composeMap and unionBoundingBox, use mergeAlignIds
from both
PiperOrigin-RevId: 236320581
EDSC Expressions can now be used to build arbitrary MLIR operations identified
by their canonical name, i.e. the name obtained from
`OpClass::getOperationName()` for registered operations. Expose this
functionality to the C API and Python bindings. This exposes builder-level
interface to Python and avoids the need for experimental Python code to
implement EDSC free function calls for constructing each op type.
This modification required exposing mlir::Attribute to the C API and Python
bindings, which only supports integer attributes for now.
This is step 4/n to making EDSCs more generalizable.
PiperOrigin-RevId: 236306776
When the LLVM IR dialect was implemented, TableGen operation definition scheme
did not support operations with variadic results. Therefore, the `call`
instruction was split into `call` and `call0` for the single- and zero-result
calls (LLVM does not support multi-result operations). Unify `call` and
`call0` using the recently added TableGen support for operations with Variadic
results. Explicitly verify that the new operation has 0 or 1 results. As a
side effect, this change enables clean-ups in the conversion to the LLVM IR
dialect that no longer needs to rely on wrapped LLVM IR void types when
constructing zero-result calls.
PiperOrigin-RevId: 236119197
Original implementation of OutUtils provided two different LLVM IR module
transformers to be used with the MLIR ExecutionEngine: OptimizingTransformer
parameterized by the optimization levels (similar to -O3 flags) and
LLVMPassesTransformer parameterized by the string formatted similarly to
command line options of LLVM's "opt" tool without support for -O* flags.
Introduce such support by declaring the flags inside the parser and by
populating the pass managers similarly to what "opt" does. Remove the
additional flags from mlir-cpu-runner as they can now be wrapped into
`-llvm-opts` together with other LLVM-related flags.
PiperOrigin-RevId: 236107292
- detect more trivially redundant constraints in
FlatAffineConstraints::removeTrivialRedundantConstraints. Redundancy due to
constraints that only differ in the constant part (eg., 32i + 64j - 3 >= 0, 32 +
64j - 8 >= 0) is now detected. The method is still linear-time and does
a single scan over the FlatAffineConstraints buffer. This detection is useful
and needed to eliminate redundant constraints generated after FM elimination.
- update GCDTightenInequalities so that we also normalize by the GCD while at
it. This way more constraints will show up as redundant (232i - 203 >= 0
becomes i - 1 >= 0 instead of 232i - 232 >= 0) without having to call
normalizeConstraintsByGCD.
- In FourierMotzkinEliminate, call GCDTightenInequalities and
normalizeConstraintsByGCD before calling removeTrivialRedundantConstraints()
- so that more redundant constraints are detected. As a result, redundancy
due to constraints like i - 5 >= 0, i - 7 >= 0, 2i - 5 >= 0, 232i - 203 >=
0 is now detected (here only i >= 7 is non-redundant).
As a result of these, a -memref-bound-check on the added test case runs in 16ms
instead of 1.35s (opt build) and no longer returns a conservative result.
PiperOrigin-RevId: 235983550
The definitions of derived passes have now changed and passes must adhere to the following:
* Inherit from a CRTP base class FunctionPass/ModulePass.
- This class provides several necessary utilities for the transformation:
. Access to the IR unit being transformed (getFunction/getModule)
. Various utilities for pass identification and registration.
* Provide a 'PassResult runOn(Function|Module)()' method to transform the IR.
- This replaces the runOn* functions from before.
This patch also introduces the notion of the PassManager. This allows for simplified construction of pass pipelines and acts as the sole interface for executing passes. This is important as FunctionPass will no longer have a 'runOnModule' method.
PiperOrigin-RevId: 235952008
EDSC provide APIs for constructing and modifying the IR. These APIs are
currently tested by a "test" module pass that reads the dummy IR (empty
functions), recognizes certain function names and injects the IR into those
functions based on their name. This situation is unsatisfactory because the
expected outcome of the test lives in a different file than the input to the
test, i.e. the API calls.
Create a new binary for tests that constructs the IR from scratch using EDSC
APIs and prints it. Put FileCheck comments next to the printing. This removes
the need to have a file with dummy inputs and assert on its contents in the
test driver. The test source includes a simplistic test harness that runs all
functions marked as TEST_FUNC but intentionally does not include any
value-testing functionality.
PiperOrigin-RevId: 235886629
LoopFusion
- getConstDifference in LoopFusion is pending a refactoring to handle bounds
with min's and max's; it currently asserts on some useful test cases that we
want to experiment with. This CL changes getSliceBounds to be more
conservative so as to not trigger the assertion. Filed b/126426796 to track this.
PiperOrigin-RevId: 235826538
- clean up loop fusion CL options for promoting local buffers to fast memory
space
- add parameters to loop fusion pass instantiation
PiperOrigin-RevId: 235813419
When lowering to MLIR(LLVMDialect) we unbox the structs that result
from converting static memrefs, that is, singleton structs
that just contain a raw pointer. This allows us to get rid of all
"extractvalue" instructions in the common case where shapes are fully
known.
PiperOrigin-RevId: 235706021
Since the goal of the LLVM IR dialect is to reflect LLVM IR in MLIR, the
dialect and the conversion procedure must account for the differences betweeen
block arguments and LLVM IR PHI nodes. In particular, LLVM IR disallows PHI
nodes with different values coming from the same source. Therefore, the LLVM IR
dialect now disallows `cond_br` operations that have identical successors
accepting arguments, which would lead to invalid PHI nodes. The conversion
process resolves the potential PHI source ambiguity by injecting dummy blocks
if the same block is used more than once as a successor in an instruction.
These dummy blocks branch unconditionally to the original successors, pass them
the original operands (available in the dummy block because it is dominated by
the original block) and are used instead of them in the original terminator
operation.
PiperOrigin-RevId: 235682798
Addressing post-submit comments. The `getelementptr` operation now supports
non-constant indexes, similarly to LLVM, and this functionality is exercised by
the lowering to the dialect. Update the documentation accordingly.
List the values of integer comparison predicates, which currently correspond to
those of CmpIOp in MLIR. Ideally, we would use strings instead, but it
requires additional support for argument conversion in both the dialect
lowering pass and the LLVM translator.
PiperOrigin-RevId: 235678877
This CL adds a primitive to perform stripmining of a loop by a given factor and
sinking it under multiple target loops.
In turn this is used to implement imperfectly nested loop tiling (with interchange) by repeatedly calling the stripmineSink primitive.
The API returns the point loops and allows repeated invocations of tiling to achieve declarative, multi-level, imperfectly-nested tiling.
Note that this CL is only concerned with the mechanical aspects and does not worry about analysis and legality.
The API is demonstrated in an example which creates an EDSC block, emits the corresponding MLIR and applies imperfectly-nested tiling:
```cpp
auto block = edsc::block({
For(ArrayRef<edsc::Expr>{i, j}, {zero, zero}, {M, N}, {one, one}, {
For(k1, zero, O, one, {
C({i, j, k1}) = A({i, j, k1}) + B({i, j, k1})
}),
For(k2, zero, O, one, {
C({i, j, k2}) = A({i, j, k2}) + B({i, j, k2})
}),
}),
});
// clang-format on
emitter.emitStmts(block.getBody());
auto l_i = emitter.getAffineForOp(i), l_j = emitter.getAffineForOp(j),
l_k1 = emitter.getAffineForOp(k1), l_k2 = emitter.getAffineForOp(k2);
auto indicesL1 = mlir::tile({l_i, l_j}, {512, 1024}, {l_k1, l_k2});
auto l_ii1 = indicesL1[0][0], l_jj1 = indicesL1[1][0];
mlir::tile({l_jj1, l_ii1}, {32, 16}, l_jj1);
```
The edsc::Expr for the induction variables (i, j, k_1, k_2) provide the programmatic hooks from which tiling can be applied declaratively.
PiperOrigin-RevId: 235548228
Leverage the recently introduced support for multiple argument groups and
multiple destination blocks in EDSC Expressions to implement conditional
branches in EDSC. Conditional branches have two successors and three argument
groups. The first group contains a single expression of i1 type that
corresponds to the condition of the branch. The two following groups contain
arguments of the two successors of the conditional branch instruction, in the
same order as the successors. Expose this instruction to the C API and Python
bindings.
PiperOrigin-RevId: 235542768
The new implementation of blocks was designed to support blocks with arguments.
More specifically, StmtBlock can be constructed with a list of Bindables that
will be bound to block aguments upon construction. Leverage this functionality
to implement branch instructions with arguments.
This additionally requires the statement storage to have a list of successors,
similarly to core IR operations.
Becauase successor chains can form loops, we need a possibility to decouple
block declaration, after which it becomes usable by branch instructions, from
block body definition. This is achieved by creating an empty block and by
resetting its body with a new list of instructions. Note that assigning a
block from another block will not affect any instructions that may have
designated this block as their successor (this behavior is necessary to make
value-type semantics of EDSC types consistent). Combined, one can now write
generators like
EDSCContext context;
Type indexType = ...;
Bindable i(indexType), ii(indexType), zero(indexType), one(indexType);
StmtBlock loopBlock({i}, {});
loopBlock.set({ii = i + one,
Branch(loopBlock, {ii})});
MLIREmitter(&builder)
.bindConstant<ConstantIndexOp>(zero, 0)
.bindConstant<ConstantIndexOp>(one, 1)
.emitStmt(Branch(loopBlock, {zero}));
where the emitter will emit the statement and its successors, if present.
PiperOrigin-RevId: 235541892
This came up in post-submit review. Use LLVM's support for outputting APInt
values directly instead of obtaining a 64-bit integer value from APInt, which
will not work for wider integers.
PiperOrigin-RevId: 235531574
Previously we have `auto pos = std::string::find(...) != std::string::npos` as
if condition to control substring substitution. Instead of the position for the
found substring, `pos` will be a boolean value indicating found nor not. Then
used as the replace start position, we were always replacing starting from 0 or
1. If the replaced substring also has the pattern to be matched, we'll see
an infinite loop.
PiperOrigin-RevId: 235504681
Analysis - NFC
- refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it
doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints
could be moved out of IR/; the simplification that the IR needs for
AffineExpr's doesn't depend on FlatAffineConstraints
- have AffineExprFlattener derive from SimpleAffineExprFlattener to use for
all Analysis/Transforms purposes; override addLocalFloorDivId in the derived
class
- turn addAffineForOpDomain into a method on FlatAffineConstraints
- turn AffineForOp::getAsValueMap into an AffineValueMap ctor
PiperOrigin-RevId: 235283610
The only reason in starting with a fixedpoint add is that it is the absolute simplest variant and illustrates the level of abstraction I'm aiming for.
The overall flow would be:
1. Determine quantization parameters (out of scope of this cl).
2. Source dialect rules to lower supported math ops to the quantization dialect (out of scope of this cl).
3. Quantization passes: [-quant-convert-const, -quant-lower-uniform-real-math, -quant-lower-unsupported-to-float] (the last one not implemented yet)
4. Target specific lowering of the integral arithmetic ops (roughly at the level of gemmlowp) to more fundamental operations (i.e. calls to gemmlowp, simd instructions, DSP instructions, etc).
How I'm doing this should facilitate implementation of just about any kind of backend except TFLite, which has a very course, adhoc surface area for its quantized kernels. Options there include (I'm not taking an opinion on this - just trying to provide options):
a) Not using any of this: just match q/dbarrier + tf math ops to the supported TFLite quantized op set.
b) Implement the more fundamental integer math ops on TFLite and convert to those instead of the current op set.
Note that I've hand-waved over the process of choosing appropriate quantization parameters. Getting to that next. As you can see, different implementations will likely have different magic combinations of specific math support, and we will need the target system that has been discussed for some of the esoteric cases (i.e. many DSPs only support POT fixedpoint).
Two unrelated changes to the overall goal of this CL and can be broken out of desired:
- Adding optional attribute support to TabelGen
- Allowing TableGen native rewrite hooks to return nullptr, signalling that no rewrite has been done.
PiperOrigin-RevId: 235267229
Add a documentation page on the key points of the conversion to LLVM IR. This
focuses on the aspects of conversion that are relevant for integration of the
LLVM IR dialect (and produced LLVM IR that is mostly a one-to-one translation)
into other projects. In particular, it describes the type conversion rules and
the memref model supporting dynamic sizes.
PiperOrigin-RevId: 235190772
The LLVM IR pass was bootstrapped without user documentation, following LLVM's
language reference and existing conversions between MLIR standard operations
and LLVM IR instructions. Provide concise documentation of the LLVM IR dialect
operations. This documentation does not describe the semantics of the
operations, which should match that of LLVM IR, but highlights the structural
differences in operation definitions, in particular using attributes instead of
constant-only values. It also describes pseudo-operations that exist only to
make the LLVM IR dialect self-contained within MLIR.
While it could have been possible to generate operation description from
TableGen, this opts for a more concise format where groups of related
operations are described together.
PiperOrigin-RevId: 235149136
Add support for lowering DivF and RemF to LLVM::FDiv and LLMV::FRem
respectively. The lowering is a trivial one-to-one transformation.
The corresponding operations already existed in the LLVM IR dialect and can be
lowered to the LLVM IR proper. Add the necessary tests for scalar and vector
forms.
PiperOrigin-RevId: 234984608
This change introduces three new operators in EDSC: Div (also exposed
via Expr.__div__ aka /) -- floating-point division, FloorDiv and CeilDiv
for flooring/ceiling index division.
The lowering to LLVM will be implemented in b/124872679.
PiperOrigin-RevId: 234963217
Introduce support for binding MLIR functions as constant expressions. Standard
constant operation supports functions as possible constant values.
Provide C APIs to look up existing named functions in an MLIR module and expose
them to the Python bindings. Provide Python bindings to declare a function in
an MLIR module without defining it and to add a definition given a function
declaration. These declarations are useful when attempting to link MLIR
modules with, e.g., the standard library.
Introduce EDSC support for direct and indirect calls to other MLIR functions.
Internally, an indirect call is always emitted to leverage existing support for
delayed construction of MLIR Values using EDSC Exprs. If the expression is
bound to a constant function (looked up or declared beforehand), MLIR constant
folding will be able to replace an indirect call by a direct call. Currently,
only zero- and one-result functions are supported since we don't have support
for multi-valued expressions in EDSC.
Expose function calling interface to Python bindings on expressions by defining
a `__call__` function accepting a variable number of arguments.
PiperOrigin-RevId: 234959444
* Introduce a OpTrait class in C++ to wrap the TableGen definition;
* Introduce PredOpTrait and rename previous usage of OpTrait to NativeOpTrait;
* PredOpTrait allows specifying a trait of the operation by way of predicate on the operation. This will be used in future to create reusable set of trait building blocks in the definition of operations. E.g., indicating whether to operands have the same type and allowing locally documenting op requirements by trait composition.
- Some of these building blocks could later evolve into known fixed set as LLVMs backends do, but that can be considered with more data.
* Use the modelling to address one verify TODO in a very local manner.
This subsumes the current custom verify specification which will be removed in a separate mechanical CL.
PiperOrigin-RevId: 234827169
A recent change made ConstantOp::build accept a NumericAttr or assert that a
generic Attribute is in fact a NumericAttr. The rationale behind the change
was that NumericAttrs have a type that can be used as the result type of the
constant operation. FunctionAttr also has a type, and it is valid to construct
function-typed constants as exercised by the parser.mlir test. Relax
ConstantOp::build back to take a generic Attribute. In the overload that only
takes an attribute, assert that the Attribute is either a NumericAttr or a
FunctionAttr, because it is necessary to extract the type. In the overload
that takes both type type and the attribute, delegate the attribute type
checking to ConstantOp::verify to prevent non-Builder-based Op construction
mechanisms from creating invalid IR.
PiperOrigin-RevId: 234798569
The recent rework of MLIREmitter switched to using the generic call to
`builder.createOperation` from OperationState instead of individual customized
calls to `builder.create<>`. As a result, regular non-composed affine apply
operations where emitted. Introduce a special case in Expr::build to always
create composed affine maps instead, as it used to be the case before the
rework.
Such special-casing goes against the idea of EDSC generality and extensibility.
Instead, we should consider declaring the composed form canonical for
affine.apply operations and using the builder support for creating operations
and canonicalizing them immediately (ongoing effort).
PiperOrigin-RevId: 234790129
Introduce a type-safe way of building a 'for' loop with max/min bounds in EDSC.
Define new types MaxExpr and MinExpr in C++ EDSC API and expose them to Python
bindings. Use values of these type to construct 'for' loops with max/min in
newly introduced overloads of the `edsc::For` factory function. Note that in C
APIs, we still must expose MaxMinFor as a different function because C has no
overloads. Also note that MaxExpr and MinExpr do _not_ derive from Expr
because they are not allowed to be used in a regular Expr context (which may
produce `affine.apply` instructions not expecting `min` or `max`).
Factory functions `Min` and `Max` in Python can be further overloaded to
produce chains of comparisons and selects on non-index types. This is not
trivial in C++ since overloaded functions cannot differ by the return type only
(`MaxExpr` or `Expr`) and making `MaxExpr` derive from `Expr` defies the
purpose of type-safe construction.
PiperOrigin-RevId: 234786131
MLIR supports 'for' loops with lower(upper) bound defined by taking a
maximum(minimum) of a list of expressions, but does not have first-class affine
constructs for the maximum(minimum). All these expressions must have affine
provenance, similarly to a single-expression bound. Add support for
constructing such loops using EDSC. The expression factory function is called
`edsc::MaxMinFor` to (1) highlight that the maximum(minimum) operation is
applied to the lower(upper) bound expressions and (2) differentiate it from a
`edsc::For` that creates multiple perfectly nested loops (and should arguably
be called `edsc::ForNest`).
PiperOrigin-RevId: 234785996
Introduce a functionality to create EDSC expressions from typed constants.
This complements the current functionality that uses "unbound" expressions and
binds them to a specific constant before emission. It comes in handy in cases
where we want to check if something is a constant early during construciton
rather than late during emission, for example multiplications and divisions in
affine expressions. This is also consistent with MLIR vision of constants
being defined by an operation (rather than being special kinds of values in the
IR) by exposing this operation as EDSC expression.
PiperOrigin-RevId: 234758020
This CL fixes 2 recent issues with EDSCs:
1. the type of the LHS in Stmt::operator=(Expr rhs) should be the same as the (asserted unique) return type;
2. symbols coming from DimOp should be admissible as lower / upper bounds in For
The relevant tests are added.
PiperOrigin-RevId: 234750249
- compute slices precisely where the destination iteration depends on multiple source
iterations (instead of over-approximating to the whole source loop extent)
- update unionBoundingBox to deal with input with non-matching symbols
- reenable disabled backend test case
PiperOrigin-RevId: 234714069
- hoist DMAs past all loops immediately surrounding the region that the latter
is invariant on - do this at DMA generation time itself
PiperOrigin-RevId: 234628447
Originally, edsc::Expr had a long enum edsc::ExprKind with all supported types
of operations. Recent Expr extensibility support removed the need to specify
supported types in advance. Replace the no-longer-used blocks of enum values
reserved for unary/binary/ternary/variadic expressions with simple values (it
is still useful to know if an expression is, e.g., binary to access it through
a simpler API).
Furthermore, wrap string-comparison now used to identify specific ops into an
`Expr::is_op<>` function template, that acts similarly to `Instruction::isa<>`.
Introduce `{Unary,Binary,Ternary,Variadic}Expr::make<> ` function template that
creates a Expression emitting the MLIR Op specified as template argument.
PiperOrigin-RevId: 234612916
Expose the result types of edsc::Expr, which are now stored for all types of
Exprs and not only for the variadic ones. Require return types when an Expr is
constructed, if it will ever have some. An empty return type list is
interpreted as an Expr that does not create a value (e.g. `return` or `store`).
Conceptually, all edss::Exprs are now typed, with the type being a (potentially
empty) tuple of return types. Unbound expressions and Bindables must now be
constructed with a specific type they will take. This makes EDSC less
evidently type-polymorphic, but we can still write generic code such as
Expr sumOfSquares(Expr lhs, Expr rhs) { return lhs * lhs + rhs * rhs; }
and use it to construct different typed expressions as
sumOfSquares(Bindable(IndexType::get(ctx)), Bindable(IndexType::get(ctx)));
sumOfSquares(Bindable(FloatType::getF32(ctx)),
Bindable(FloatType::getF32(ctx)));
On the positive side, we get the following.
1. We can now perform type checking when constructing Exprs rather than during
MLIR emission. Nevertheless, this is still duplicates the Op::verify()
until we can factor out type checking from that.
2. MLIREmitter is significantly simplified.
3. ExprKind enum is only used for actual kinds of expressions. Data structures
are converging with AbstractOperation, and the users can now create a
VariadicExpr("canonical_op_name", {types}, {exprs}) for any operation, even
an unregistered one without having to extend the enum and make pervasive
changes to EDSCs.
On the negative side, we get the following.
1. Typed bindables are more verbose, even in Python.
2. We lose the ability to do print debugging for higher-level EDSC abstractions
that are implemented as multiple MLIR Ops, for example logical disjunction.
This is the step 2/n towards making EDSC extensible.
***
Move MLIR Op construction from MLIREmitter::emitExpr to Expr::build since Expr
now has sufficient information to build itself.
This is the step 3/n towards making EDSC extensible.
Both of these strive to minimize the amount of irrelevant changes. In
particular, this introduces more complex pretty-printing for affine and binary
expression to make sure tests continue to pass. It also relies on string
comparison to identify specific operations that an Expr produces.
PiperOrigin-RevId: 234609882