Most of the tests have been ported to be unit-tests and this pass is problematic in the way it depends on TableGen-generated files. This pass is also non-deterministic during multi-threading and a blocker to turning it on by default.
PiperOrigin-RevId: 240889154
Example:
%call:2 = call @multi_return() : () -> (f32, i32)
use(%calltensorflow/mlir#0, %calltensorflow/mlir#1)
This cl also adds parser support for uniquely named result values. This means that a test writer can now write something like:
%foo, %bar = call @multi_return() : () -> (f32, i32)
use(%foo, %bar)
Note: The printer will still print the collapsed form.
PiperOrigin-RevId: 240860058
This CL allows vectorization to be called and configured in other ways than just via command line arguments.
This allows triggering vectorization programmatically.
PiperOrigin-RevId: 240638208
The `Builder*` parameter is unused in both generated build() methods so that we can
leave it unnamed. Changed stand-alone parameter build() to take `_tblgen_state` instead
of `result` to allow `result` to avoid having name collisions with op operand,
attribute, or result.
PiperOrigin-RevId: 240637700
When multi-threading is enabled in the pass manager the meaning of the display
slightly changes. First, a new timing column is added, `User Time`, that
displays the total time spent across all threads. Secondly, the `Wall Time`
column displays the longest individual time spent amongst all of the threads.
This means that the `Wall Time` column will continue to give an indicator on the
perceived time, or clock time, whereas the `User Time` will display the total
cpu time.
Example:
$ mlir-opt foo.mlir -experimental-mt-pm -cse -canonicalize -convert-to-llvmir -pass-timing
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0078 seconds
---User Time--- ---Wall Time--- --- Name ---
0.0175 ( 88.3%) 0.0055 ( 70.4%) Function Pipeline
0.0018 ( 9.3%) 0.0006 ( 8.1%) CSE
0.0013 ( 6.3%) 0.0004 ( 5.8%) (A) DominanceInfo
0.0017 ( 8.7%) 0.0006 ( 7.1%) FunctionVerifier
0.0128 ( 64.6%) 0.0039 ( 50.5%) Canonicalizer
0.0011 ( 5.7%) 0.0004 ( 4.7%) FunctionVerifier
0.0004 ( 2.1%) 0.0004 ( 5.2%) ModuleVerifier
0.0010 ( 5.3%) 0.0010 ( 13.4%) LLVMLowering
0.0009 ( 4.3%) 0.0009 ( 11.0%) ModuleVerifier
0.0198 (100.0%) 0.0078 (100.0%) Total
PiperOrigin-RevId: 240636269
The spec allows zero-dimensional memrefs to exist and treats them essentially
as single-element buffers. Unlike single-dimensional memrefs of static shape
<1xTy>, zero-dimensional memrefs do not require indices to access the only
element they store. Add support of zero-dimensional memrefs to the LLVM IR
conversion. In particular, such memrefs are converted into bare pointers, and
accesses to them are converted to bare loads and stores, without the overhead
of `getelementptr %buffer, 0`.
PiperOrigin-RevId: 240579456
Due to legacy reasons (ML/CFG function separation), regions in affine control
flow operations require contained blocks not to have terminators. This is
inconsistent with the notion of the block and may complicate code motion
between regions of affine control operations and other regions.
Introduce `affine.terminator`, a special terminator operation that must be used
to terminate blocks inside affine operations and transfers the control back to
he region enclosing the affine operation. For brevity and readability reasons,
allow `affine.for` and `affine.if` to omit the `affine.terminator` in their
regions when using custom printing and parsing format. The custom parser
injects the `affine.terminator` if it is missing so as to always have it
present in constructed operations.
Update transformations to account for the presence of terminator. In
particular, most code motion transformation between loops should leave the
terminator in place, and code motion between loops and non-affine blocks should
drop the terminator.
PiperOrigin-RevId: 240536998
Before this CL, the result type of the pattern match results need to be as same
as the first operand type, operand broadcast type or a generic tensor type.
This CL adds a new trait to set the result type by attribute. For example, the
TFL_ConstOp can use this to set the output type to its value attribute.
PiperOrigin-RevId: 240441249
Currently, regions can only be constructed by passing in a `Function` or an
`Instruction` pointer referencing the parent object, unlike `Function`s or
`Instruction`s themselves that can be created without a parent. It leads to a
rather complex flow in operation construction where one has to create the
operation first before being able to work with its regions. It may be
necessary to work with the regions before the operation is created. In
particular, in `build` and `parse` functions that are executed _before_ the
operation is created in cases where boilerplate region manipulation is required
(for example, inserting the hypothetical default terminator in affine regions).
Allow creating standalone regions. Such regions are meant to own a list of
blocks and transfer them to other regions on demand.
Each instruction stores a fixed number of regions as trailing objects and has
ownership of them. This decreases the size of the Instruction object for the
common case of instructions without regions. Keep this behavior intact. To
allow some flexibility in construction, make OperationState store an owning
vector of regions. When the Builder creates an Instruction from
OperationState, the bodies of the regions are transferred into the
instruction-owned regions to minimize copying. Thus, it becomes possible to
fill standalone regions with blocks and move them to an operation when it is
constructed, or move blocks from a region to an operation region, e.g., for
inlining.
PiperOrigin-RevId: 240368183
inherited constructors, which is cleaner and means you can now use DimOp()
to get a null op, instead of having to use Instruction::getNull<DimOp>().
This removes another 200 lines of code.
PiperOrigin-RevId: 240068113
This should probably be changed to instead use the negated form (e.g., get predicate + negate it + get resulting template), but this fixes it locally.
PiperOrigin-RevId: 240067116
We just need a way to unpack ArrayRef<ValueHandle> to ArrayRef<Value*>.
No need to expose this to the user.
This reduces the cognitive overhead for the tutorial.
PiperOrigin-RevId: 240037425
tblgen be non-const. This requires introducing some const_cast's at the
moment, but those (and lots more stuff) will disappear in subsequent patches.
This significantly simplifies those patches because the various tblgen op emitters
get adjusted.
PiperOrigin-RevId: 239954566
Enable users specifying operand type constraint combinations (e.g., considering multiple operands). Some of these will be refactored (particularly the OpBase change and that should also not be needed to be done by most users), but the focus is more on user side (shown in test). The generated code for this does not take any known facts into account or perform any simplification.
Start with 2 primities to specify 1) whether an operand has a specific element type, and 2) whether an operand's element type matches another operands element type.
PiperOrigin-RevId: 239875712
This CL revisits the composition of AffineApplyOp for the special case where a symbol
itself comes from an AffineApplyOp.
This is achieved by rewriting such symbols into dims to allow composition to occur mathematically.
The implementation is also refactored to improve readability.
Rationale for locally rewriting symbols as dims:
================================================
The mathematical composition of AffineMap must always concatenate symbols
because it does not have enough information to do otherwise. For example,
composing `(d0)[s0] -> (d0 + s0)` with itself must produce
`(d0)[s0, s1] -> (d0 + s0 + s1)`.
The result is only equivalent to `(d0)[s0] -> (d0 + 2 * s0)` when
applied to the same mlir::Value* for both s0 and s1.
As a consequence mathematical composition of AffineMap always concatenates
symbols.
When AffineMaps are used in AffineApplyOp however, they may specify
composition via symbols, which is ambiguous mathematically. This corner case
is handled by locally rewriting such symbols that come from AffineApplyOp
into dims and composing through dims.
PiperOrigin-RevId: 239791597
Previously we emit both op declaration and definition into one file and include it
in *Ops.h. That pulls in lots of implementation details in the header file and we
cannot hide symbols local to implementation. This CL splits them to provide a cleaner
interface.
The way how we define custom builders in TableGen is changed accordingly because now
we need to distinguish signatures and implementation logic. Some custom builders with
complicated logic now can be moved to be implemented in .cpp entirely.
PiperOrigin-RevId: 239509594
This CL fixes an issue where cloned loop induction variables were not properly
propagated and beefs up the corresponding test.
PiperOrigin-RevId: 239422961
This CL introduces a ValueArrayHandle helper to manage the implicit conversion
of ArrayRef<ValueHandle> -> ArrayRef<Value*> by converting first to ValueArrayHandle.
Without this, boilerplate operations that take ArrayRef<Value*> cannot be removed easily.
This all seems to boil down to decoupling Value from Type.
Alternative solutions exist (e.g. MLIR using Value by value everywhere) but they would be very intrusive. This seems to be the lowest impedance change.
Intrinsics are also lowercased by popular demand.
PiperOrigin-RevId: 238974125
This CL removes the dependency of LowerVectorTransfers on the AST version of EDSCs which will be retired.
This exhibited a pretty fundamental staging difference in AST-based vs declarative based emission.
Since the delayed creation with an AST was staged, the loop order came into existence after the clipping expressions were computed.
This now changes as the loops first need to be created declaratively in fixed order and then the clipping expressions are created.
Also, due to lack of staging, coalescing cannot be done on the fly anymore and
needs to be done either as a pre-pass (current implementation) or as a local transformation on the generated IR (future work).
Tests are updated accordingly.
PiperOrigin-RevId: 238971631
* print-ir-before=(comma-separated-pass-list)
- Print the IR before each of the passes provided within the pass list.
* print-ir-before-all
- Print the IR before every pass in the pipeline.
* print-ir-after=(comma-separated-pass-list)
- Print the IR after each of the passes provided within the pass list.
* print-ir-after-all
- Print the IR after every pass in the pipeline.
* print-ir-module-scope
- Always print the Module IR, even for non module passes.
PiperOrigin-RevId: 238523649
- emit a note on the loop being parallel instead of setting a loop attribute
- rename the pass -test-detect-parallel (from -detect-parallel)
PiperOrigin-RevId: 238122847
Add support to create a new attribute from multiple attributes. It extended the
DagNode class to represent attribute creation dag. It also changed the
RewriterGen::emitOpCreate method to support this nested dag emit.
An unit test is added.
PiperOrigin-RevId: 238090229
- fix for getConstantBoundOnDimSize: floordiv -> ceildiv for extent
- make getConstantBoundOnDimSize also return the identifier upper bound
- fix unionBoundingBox to correctly use the divisor and upper bound identified by
getConstantBoundOnDimSize
- deal with loop step correctly in addAffineForOpDomain (covers most cases now)
- fully compose bound map / operands and simplify/canonicalize before adding
dim/symbol to FlatAffineConstraints; fixes false positives in -memref-bound-check; add
test case there
- expose mlir::isTopLevelSymbol from AffineOps
PiperOrigin-RevId: 238050395
multi-result upper bounds, complete TODOs, fix/improve test cases.
- complete TODOs for loop unroll/unroll-and-jam. Something as simple as
"for %i = 0 to %N" wasn't being unrolled earlier (unless it had been written
as "for %i = ()[s0] -> (0)()[%N] to %N"; addressed now.
- update/replace getTripCountExpr with buildTripCountMapAndOperands; makes it
more powerful as it composes inputs into it
- getCleanupLowerBound and getUnrolledLoopUpperBound actually needed the same
code; refactor and remove one.
- reorganize test cases, write previous ones better; most of these changes are
"label replacements".
- fix wrongly labeled test cases in unroll-jam.mlir
PiperOrigin-RevId: 238014653
This CL makes some minor changes to the declarative builder Helpers:
1. adds lb, ub, step methods to MemRefView to avoid always having to go through std::get + range;
2. drops MemRefView& from IndexedValue which was just creating ownership concerns. Instead, an IndexedValue only needs to keep track of the ValueHandle from which a MemRefView can be constructed on-demand if necessary.
PiperOrigin-RevId: 237861493