When the LLVM IR dialect was implemented, TableGen operation definition scheme
did not support operations with variadic results. Therefore, the `call`
instruction was split into `call` and `call0` for the single- and zero-result
calls (LLVM does not support multi-result operations). Unify `call` and
`call0` using the recently added TableGen support for operations with Variadic
results. Explicitly verify that the new operation has 0 or 1 results. As a
side effect, this change enables clean-ups in the conversion to the LLVM IR
dialect that no longer needs to rely on wrapped LLVM IR void types when
constructing zero-result calls.
PiperOrigin-RevId: 236119197
Original implementation of OutUtils provided two different LLVM IR module
transformers to be used with the MLIR ExecutionEngine: OptimizingTransformer
parameterized by the optimization levels (similar to -O3 flags) and
LLVMPassesTransformer parameterized by the string formatted similarly to
command line options of LLVM's "opt" tool without support for -O* flags.
Introduce such support by declaring the flags inside the parser and by
populating the pass managers similarly to what "opt" does. Remove the
additional flags from mlir-cpu-runner as they can now be wrapped into
`-llvm-opts` together with other LLVM-related flags.
PiperOrigin-RevId: 236107292
- detect more trivially redundant constraints in
FlatAffineConstraints::removeTrivialRedundantConstraints. Redundancy due to
constraints that only differ in the constant part (eg., 32i + 64j - 3 >= 0, 32 +
64j - 8 >= 0) is now detected. The method is still linear-time and does
a single scan over the FlatAffineConstraints buffer. This detection is useful
and needed to eliminate redundant constraints generated after FM elimination.
- update GCDTightenInequalities so that we also normalize by the GCD while at
it. This way more constraints will show up as redundant (232i - 203 >= 0
becomes i - 1 >= 0 instead of 232i - 232 >= 0) without having to call
normalizeConstraintsByGCD.
- In FourierMotzkinEliminate, call GCDTightenInequalities and
normalizeConstraintsByGCD before calling removeTrivialRedundantConstraints()
- so that more redundant constraints are detected. As a result, redundancy
due to constraints like i - 5 >= 0, i - 7 >= 0, 2i - 5 >= 0, 232i - 203 >=
0 is now detected (here only i >= 7 is non-redundant).
As a result of these, a -memref-bound-check on the added test case runs in 16ms
instead of 1.35s (opt build) and no longer returns a conservative result.
PiperOrigin-RevId: 235983550
The definitions of derived passes have now changed and passes must adhere to the following:
* Inherit from a CRTP base class FunctionPass/ModulePass.
- This class provides several necessary utilities for the transformation:
. Access to the IR unit being transformed (getFunction/getModule)
. Various utilities for pass identification and registration.
* Provide a 'PassResult runOn(Function|Module)()' method to transform the IR.
- This replaces the runOn* functions from before.
This patch also introduces the notion of the PassManager. This allows for simplified construction of pass pipelines and acts as the sole interface for executing passes. This is important as FunctionPass will no longer have a 'runOnModule' method.
PiperOrigin-RevId: 235952008
EDSC provide APIs for constructing and modifying the IR. These APIs are
currently tested by a "test" module pass that reads the dummy IR (empty
functions), recognizes certain function names and injects the IR into those
functions based on their name. This situation is unsatisfactory because the
expected outcome of the test lives in a different file than the input to the
test, i.e. the API calls.
Create a new binary for tests that constructs the IR from scratch using EDSC
APIs and prints it. Put FileCheck comments next to the printing. This removes
the need to have a file with dummy inputs and assert on its contents in the
test driver. The test source includes a simplistic test harness that runs all
functions marked as TEST_FUNC but intentionally does not include any
value-testing functionality.
PiperOrigin-RevId: 235886629
LoopFusion
- getConstDifference in LoopFusion is pending a refactoring to handle bounds
with min's and max's; it currently asserts on some useful test cases that we
want to experiment with. This CL changes getSliceBounds to be more
conservative so as to not trigger the assertion. Filed b/126426796 to track this.
PiperOrigin-RevId: 235826538
- clean up loop fusion CL options for promoting local buffers to fast memory
space
- add parameters to loop fusion pass instantiation
PiperOrigin-RevId: 235813419
When lowering to MLIR(LLVMDialect) we unbox the structs that result
from converting static memrefs, that is, singleton structs
that just contain a raw pointer. This allows us to get rid of all
"extractvalue" instructions in the common case where shapes are fully
known.
PiperOrigin-RevId: 235706021
Since the goal of the LLVM IR dialect is to reflect LLVM IR in MLIR, the
dialect and the conversion procedure must account for the differences betweeen
block arguments and LLVM IR PHI nodes. In particular, LLVM IR disallows PHI
nodes with different values coming from the same source. Therefore, the LLVM IR
dialect now disallows `cond_br` operations that have identical successors
accepting arguments, which would lead to invalid PHI nodes. The conversion
process resolves the potential PHI source ambiguity by injecting dummy blocks
if the same block is used more than once as a successor in an instruction.
These dummy blocks branch unconditionally to the original successors, pass them
the original operands (available in the dummy block because it is dominated by
the original block) and are used instead of them in the original terminator
operation.
PiperOrigin-RevId: 235682798
Addressing post-submit comments. The `getelementptr` operation now supports
non-constant indexes, similarly to LLVM, and this functionality is exercised by
the lowering to the dialect. Update the documentation accordingly.
List the values of integer comparison predicates, which currently correspond to
those of CmpIOp in MLIR. Ideally, we would use strings instead, but it
requires additional support for argument conversion in both the dialect
lowering pass and the LLVM translator.
PiperOrigin-RevId: 235678877
This CL adds a primitive to perform stripmining of a loop by a given factor and
sinking it under multiple target loops.
In turn this is used to implement imperfectly nested loop tiling (with interchange) by repeatedly calling the stripmineSink primitive.
The API returns the point loops and allows repeated invocations of tiling to achieve declarative, multi-level, imperfectly-nested tiling.
Note that this CL is only concerned with the mechanical aspects and does not worry about analysis and legality.
The API is demonstrated in an example which creates an EDSC block, emits the corresponding MLIR and applies imperfectly-nested tiling:
```cpp
auto block = edsc::block({
For(ArrayRef<edsc::Expr>{i, j}, {zero, zero}, {M, N}, {one, one}, {
For(k1, zero, O, one, {
C({i, j, k1}) = A({i, j, k1}) + B({i, j, k1})
}),
For(k2, zero, O, one, {
C({i, j, k2}) = A({i, j, k2}) + B({i, j, k2})
}),
}),
});
// clang-format on
emitter.emitStmts(block.getBody());
auto l_i = emitter.getAffineForOp(i), l_j = emitter.getAffineForOp(j),
l_k1 = emitter.getAffineForOp(k1), l_k2 = emitter.getAffineForOp(k2);
auto indicesL1 = mlir::tile({l_i, l_j}, {512, 1024}, {l_k1, l_k2});
auto l_ii1 = indicesL1[0][0], l_jj1 = indicesL1[1][0];
mlir::tile({l_jj1, l_ii1}, {32, 16}, l_jj1);
```
The edsc::Expr for the induction variables (i, j, k_1, k_2) provide the programmatic hooks from which tiling can be applied declaratively.
PiperOrigin-RevId: 235548228
Leverage the recently introduced support for multiple argument groups and
multiple destination blocks in EDSC Expressions to implement conditional
branches in EDSC. Conditional branches have two successors and three argument
groups. The first group contains a single expression of i1 type that
corresponds to the condition of the branch. The two following groups contain
arguments of the two successors of the conditional branch instruction, in the
same order as the successors. Expose this instruction to the C API and Python
bindings.
PiperOrigin-RevId: 235542768
The new implementation of blocks was designed to support blocks with arguments.
More specifically, StmtBlock can be constructed with a list of Bindables that
will be bound to block aguments upon construction. Leverage this functionality
to implement branch instructions with arguments.
This additionally requires the statement storage to have a list of successors,
similarly to core IR operations.
Becauase successor chains can form loops, we need a possibility to decouple
block declaration, after which it becomes usable by branch instructions, from
block body definition. This is achieved by creating an empty block and by
resetting its body with a new list of instructions. Note that assigning a
block from another block will not affect any instructions that may have
designated this block as their successor (this behavior is necessary to make
value-type semantics of EDSC types consistent). Combined, one can now write
generators like
EDSCContext context;
Type indexType = ...;
Bindable i(indexType), ii(indexType), zero(indexType), one(indexType);
StmtBlock loopBlock({i}, {});
loopBlock.set({ii = i + one,
Branch(loopBlock, {ii})});
MLIREmitter(&builder)
.bindConstant<ConstantIndexOp>(zero, 0)
.bindConstant<ConstantIndexOp>(one, 1)
.emitStmt(Branch(loopBlock, {zero}));
where the emitter will emit the statement and its successors, if present.
PiperOrigin-RevId: 235541892
This came up in post-submit review. Use LLVM's support for outputting APInt
values directly instead of obtaining a 64-bit integer value from APInt, which
will not work for wider integers.
PiperOrigin-RevId: 235531574
Previously we have `auto pos = std::string::find(...) != std::string::npos` as
if condition to control substring substitution. Instead of the position for the
found substring, `pos` will be a boolean value indicating found nor not. Then
used as the replace start position, we were always replacing starting from 0 or
1. If the replaced substring also has the pattern to be matched, we'll see
an infinite loop.
PiperOrigin-RevId: 235504681
Analysis - NFC
- refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it
doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints
could be moved out of IR/; the simplification that the IR needs for
AffineExpr's doesn't depend on FlatAffineConstraints
- have AffineExprFlattener derive from SimpleAffineExprFlattener to use for
all Analysis/Transforms purposes; override addLocalFloorDivId in the derived
class
- turn addAffineForOpDomain into a method on FlatAffineConstraints
- turn AffineForOp::getAsValueMap into an AffineValueMap ctor
PiperOrigin-RevId: 235283610
The only reason in starting with a fixedpoint add is that it is the absolute simplest variant and illustrates the level of abstraction I'm aiming for.
The overall flow would be:
1. Determine quantization parameters (out of scope of this cl).
2. Source dialect rules to lower supported math ops to the quantization dialect (out of scope of this cl).
3. Quantization passes: [-quant-convert-const, -quant-lower-uniform-real-math, -quant-lower-unsupported-to-float] (the last one not implemented yet)
4. Target specific lowering of the integral arithmetic ops (roughly at the level of gemmlowp) to more fundamental operations (i.e. calls to gemmlowp, simd instructions, DSP instructions, etc).
How I'm doing this should facilitate implementation of just about any kind of backend except TFLite, which has a very course, adhoc surface area for its quantized kernels. Options there include (I'm not taking an opinion on this - just trying to provide options):
a) Not using any of this: just match q/dbarrier + tf math ops to the supported TFLite quantized op set.
b) Implement the more fundamental integer math ops on TFLite and convert to those instead of the current op set.
Note that I've hand-waved over the process of choosing appropriate quantization parameters. Getting to that next. As you can see, different implementations will likely have different magic combinations of specific math support, and we will need the target system that has been discussed for some of the esoteric cases (i.e. many DSPs only support POT fixedpoint).
Two unrelated changes to the overall goal of this CL and can be broken out of desired:
- Adding optional attribute support to TabelGen
- Allowing TableGen native rewrite hooks to return nullptr, signalling that no rewrite has been done.
PiperOrigin-RevId: 235267229
Add a documentation page on the key points of the conversion to LLVM IR. This
focuses on the aspects of conversion that are relevant for integration of the
LLVM IR dialect (and produced LLVM IR that is mostly a one-to-one translation)
into other projects. In particular, it describes the type conversion rules and
the memref model supporting dynamic sizes.
PiperOrigin-RevId: 235190772