This allows for the conversion to match `A(B()) -> C()` with a pattern matching
`A` and marking `B` for deletion.
Also add better assertions when an operation is erased while still having uses.
Differential Revision: https://reviews.llvm.org/D99442
This exposes the ability to register Python functions with the JIT and
exposes them to the MLIR jitted code. The provided test case illustrates
the mechanism.
Differential Revision: https://reviews.llvm.org/D99562
This verification is to check if the indices for static shaped operands
on linalgOps access out of bound memory or not. For dynamic shaped
operands, we would be able to check it on runtime stage.
Found several invalid Linalg ops testcases, and fixed them.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D98390
A new `InterfaceMethod` is added to `InferShapedTypeOpInterface` that
allows an operation to return the `Value`s for each dim of its
results. It is intended for the case where the `Value` returned for
each dim is computed using the operands and operation attributes. This
interface method is for cases where the result dim of an operation can
be computed independently, and it avoids the need to aggregate all
dims of a result into a single shape value. This also implies that
this is not suitable for cases where the result type is unranked (for
which the existing interface methods is to be used).
Also added is a canonicalization pattern that uses this interface and
resolves the shapes of the output in terms of the shapes of the
inputs. Moving Linalg ops to use this interface, so that many
canonicalization patterns implemented for individual linalg ops to
achieve the same result can be removed in favor of the added
canonicalization pattern.
Differential Revision: https://reviews.llvm.org/D97887
* Also adds some verbiage about upgrading `pip` itself, since this is a
common source of issues.
Differential Revision: https://reviews.llvm.org/D99522
Subtensor operations that are taking a slice out of a tensor that is
unit-extent along a dimension can be rewritten to drop that dimension.
Differential Revision: https://reviews.llvm.org/D99226
Drop usage of `emitRemark` and use `notifyMatchFailure` instead to
avoid unnecessary spew during compilation.
Differential Revision: https://reviews.llvm.org/D99485
Convert transfer_read ops with permutation maps into simpler
transfer_read with minority map + vector.braodcast and vector.transpose.
And transfer_read with leading dimensions broacast into transfer_read of
lower rank.
Differential Revision: https://reviews.llvm.org/D99019
Add a new clone operation to the memref dialect. This operation implicitly
copies data from a source buffer to a new buffer. In contrast to the linalg.copy
operation, this operation does not accept a target buffer as an argument.
Instead, this operation performs a conceptual allocation which does not need to
be performed manually.
Furthermore, this operation resolves the dependency from the linalg-dialect
in the BufferDeallocation pass. In addition, we also extended the canonicalization
patterns to fold clone operations. The copy removal pass has been removed.
Differential Revision: https://reviews.llvm.org/D99172
Provide a registration mechanism for Linalg dialect-specific passes in C
API and Python bindings. These are being built into the dialect library
but exposed in separate headers (C) or modules (Python).
Differential Revision: https://reviews.llvm.org/D99431
`concept` is a reserved keyword in C++20, it can't be used as a variable name.
Here is an example of the failure:
```
auto *concept = getInterfaceFor(op);
^
include/mlir/IR/SymbolInterfaces.h.inc:156:12: error: expected expression [clang-diagnostic-error]
if (!concept)
^
```
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D99369
The issue was introduced in D98468.
The `{0}Regions` is an array of `std::unique_ptr<Region>` objects,
so it should be processed accordingly.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D99332
* This has the API I want but I am not thrilled with the implementation. There are various things that could be improved both about the way that Python builders are mapped and the way the Linalg ops are factored to increase code sharing between C++/Python.
* Landing this as-is since it at least makes the InitTensorOp usable with the right API. Will refactor underneath in follow-ons.
Differential Revision: https://reviews.llvm.org/D99000
The `mayNotHaveTerminator` was initially on Block but moved to the
verifier before landing and wasn't removed from its original place
where it is unused.
This reverts commit 361b7d125b by Chris
Lattner <clattner@nondot.org> dated Fri Mar 19 21:22:15 2021 -0700.
The change to the greedy rewriter driver picking a different order was
made without adequate analysis of the trade-offs and experimentation. A
change like this has far reaching consequences on transformation
pipelines, and a major impact upstream and downstream. For eg., one
can’t be sure that it doesn’t slow down a large number of cases by small
amounts or create other issues. More discussion here:
https://llvm.discourse.group/t/speeding-up-canonicalize/3015/25
Reverting this so that improvements to the traversal order can be made
on a clean slate, in bigger steps, and higher bar.
Differential Revision: https://reviews.llvm.org/D99329
In case an operation in a global initializer region refers to another
global variable defined afterwards in the module of itself, translation
to LLVM IR was currently crashing because it could not find the LLVM IR global
when going through the initializer block.
To solve this problem, split global conversion to LLVM IR into two passes. A
first pass that creates LLVM IR global variables, and a second one that converts
the initializer, if any, and adds it to the llvm global.
Differential Revision: https://reviews.llvm.org/D99246
In particular for Graph Regions, the terminator needs is just a
historical artifact of the generalization of MLIR from CFG region.
Operations like Module don't need a terminator, and before Module
migrated to be an operation with region there wasn't any needed.
To validate the feature, the ModuleOp is migrated to use this trait and
the ModuleTerminator operation is deleted.
This patch is likely to break clients, if you're in this case:
- you may iterate on a ModuleOp with `getBody()->without_terminator()`,
the solution is simple: just remove the ->without_terminator!
- you created a builder with `Builder::atBlockTerminator(module_body)`,
just use `Builder::atBlockEnd(module_body)` instead.
- you were handling ModuleTerminator: it isn't needed anymore.
- for generic code, a `Block::mayNotHaveTerminator()` may be used.
Differential Revision: https://reviews.llvm.org/D98468
Lowering of bitwise_not to linalg dialect using a xor operation with a constant
of all-bits-one.
Differential Revision: https://reviews.llvm.org/D99221
For such op chains, we can create new linalg.fill ops
with the result type of the linalg.tensor_reshape op.
Differential Revision: https://reviews.llvm.org/D99116
init tensor operands also has indexing map and generally follow
the same constraints we expect for non-init-tensor operands.
Differential Revision: https://reviews.llvm.org/D99115
This commit exposes an option to the pattern
FoldWithProducerReshapeOpByExpansion to allow
folding unit dim reshapes. This gives callers
more fine-grained controls.
Differential Revision: https://reviews.llvm.org/D99114
This identifies a pattern where the producer affine min/max op
is bound to a dimension/symbol that is used as a standalone
expression in the consumer affine op's map. In that case the
producer affine min/max op can be merged into its consumer.
For example, a pattern like the following:
```
%0 = affine.min affine_map<()[s0] -> (s0 + 16, s0 * 8)> ()[%sym1]
%1 = affine.min affine_map<(d0)[s0] -> (s0 + 4, d0)> (%0)[%sym2]
```
Can be turned into:
```
%1 = affine.min affine_map<
()[s0, s1] -> (s0 + 4, s1 + 16, s1 * 8)> ()[%sym2, %sym1]
```
Differential Revision: https://reviews.llvm.org/D99016
If there are multiple identical expressions in an affine
min/max op's map, we can just keep one.
Differential Revision: https://reviews.llvm.org/D99015
Until now Linalg fusion only allow fusing producers whose operands
are all permutation indexing maps. It's easier to deduce the
subtensor/subview but it is an unnecessary constraint, as in tiling
we have more advanced logic to deduce the subranges even when the
operand is not of permutation indexing maps, e.g., the input operand
for convolution ops.
This patch uses the logic on tiling side to deduce subranges for
fusion. This enables fusing convolution with its consumer ops
when possible.
Along the way, we are now generating proper affine.min ops to guard
against size boundaries, if we cannot be certain they won't be
out of bounds.
Differential Revision: https://reviews.llvm.org/D99014
This is a preparation step to reuse makeTiledShapes in tensor
fusion. Along the way, did some lightweight cleanups.
Differential Revision: https://reviews.llvm.org/D99013
This avoided some conversion overhead on a model in TypeUniquer when
converting from ArrayRef -> TypeRange.
Differential Revision: https://reviews.llvm.org/D99300
A recent filecheck change resulted in better reporting of invalid variables and this test had a couple. This is the second occurence that the first fix missed.
All linalg operations having a region builder shall call it during op creation. Calling it during vectorization is obsolete.
Differential Revision: https://reviews.llvm.org/D99168
Index type is an integer type of target-specific bitwidth present in many MLIR
operations (loops, memory accesses). Converting values of this type to
fixed-size integers has always been problematic. Introduce a data layout entry
to specify the bitwidth of `index` in a given layout scope, defaulting to 64
bits, which is a commonly used assumption, e.g., in constants.
Port builtin-to-LLVM type conversion to use this data layout entry when
converting `index` type and untie it from pointer size. This is particularly
relevant for GPU targets. Keep a possibility to forcibly override the index
type in lowerings.
Depends On D98525
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98937
Even if the layout specification is missing from an op that supports it, the op
is still expected to provide meaningful responses to data layout queries.
Forward them to the op instead of directly calling the default implementation.
Depends On D98524
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D98525
This is useful for bit-packing types such as vectors and tuples as well as for
exotic architectures that have non-8-bit bytes.
Depends On D98500
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D98524
ModuleOp is a natural place to provide scoped data layout information. However,
it is undesirable for ModuleOp to implement the entirety of
DataLayoutOpInterface because that would require either pushing the interface
inside the IR library instead of a separate library, or putting the default
implementation of the interface as inline functions in headers leading to
binary bloat. Instead, ModuleOp accepts an arbitrary data layout spec attribute
and has a dedicated hook to extract it, and DataLayout is modified to know
about ModuleOp particularities.
Reviewed By: herhut, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98500
Fix the BlockAndValueMapping update that was missing entries for scf.for op's blockIterArgs.
Skip cloning subtensors of the padded tensor as the logic for these is separate.
Add a filter to drop side-effecting ops.
Tests are beefed up to verify the IR is sound in all hoisting configurations for 2-level 3-D tiled matmul.
Differential Revision: https://reviews.llvm.org/D99255
Use new `MemRefType::getMemorySpace` method with generic Attribute
in cases, where there is no specific logic around the memory space.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D99154
This mechanism makes it possible for a dialect to not register all
operations but still answer interface-based queries.
This can useful for dialects that are "open" or connected to an external
system and still interoperate with the compiler. It can also open up the
possibility to have a more extensible compiler at runtime: the compiler
does not need a pre-registration for each operation and the dialect can
inject behavior dynamically.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D93085
Tosa's argmax lowering is representable as a linalg.indexed_generic
operation. Include the lowering to this type for both integer and
floating point types.
Differential Revision: https://reviews.llvm.org/D99137
To match an interface or trait, users currently have to use the `MatchAny` tag. This tag can be quite problematic for compile time for things like the canonicalizer, as the `MatchAny` patterns may get applied to *every* operation. This revision adds better support by bucketing interface/trait patterns based on which registered operations have them registered. This means that moving forward we will only attempt to match these patterns to operations that have this interface registered. Two simplify defining patterns that match traits and interfaces, two new utility classes have been added: OpTraitRewritePattern and OpInterfaceRewritePattern.
Differential Revision: https://reviews.llvm.org/D98986
This provides a simplified way to implement 'matchAndRewrite' style
canonicalization patterns for ops that don't need the full power of
RewritePatterns. Using this style, you can implement a static method
with a signature like:
```
LogicalResult AssertOp::canonicalize(AssertOp op, PatternRewriter &rewriter) {
return success();
}
```
instead of dealing with defining RewritePattern subclasses. This also
adopts this for a few canonicalization patterns in the std dialect to
show how it works.
Differential Revision: https://reviews.llvm.org/D99143
Tiling operations are generic operations with modified indexing. Updated to to
linalg lowerings to perform this lowering.
Differential Revision: https://reviews.llvm.org/D99113
Adds lowerings for matmul and fully_connected. Only supports 2D tensors for inputs and weights, and 1D tensors for bias.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D99211
This revision introduces proper backward slice computation during the hoisting of
PadTensorOp. This allows hoisting padding even across multiple levels of tiling.
Such hoisting requires the proper handling of loop bounds that may depend on enclosing
loop variables.
Differential revision: https://reviews.llvm.org/D98965
This is an assumption that is made in numerous places in the code. In
particular, in the code generated by mlir-tblgen for operand/result accessors
in ops with attr-sized operand or result lists. Make sure to verify this
assumption.
Note that the operation traits are verified before running the custom op
verifier, which can expect the trait verifier to have passed, but some traits
may be verified before the AttrSizedOperand/ResultTrait and should not make
such assumptions.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D99183
The "else" group of an optional element is a collection of elements that get parsed/printed when the anchor of the main element group is *not* present. This is useful when there is a special syntax when an element is not present. The new syntax for an optional element is shown below:
```
optional-group: `(` elements `)` (`:` `(` else-elements `)`)? `?`
```
An example of how this might be used is shown below:
```tablegen
def FooOp : ... {
let arguments = (ins UnitAttr:$foo);
let assemblyFormat = "attr-dict (`foo_is_present` $foo^):(`foo_is_absent`)?";
}
```
would be formatted as such:
```mlir
// When the `foo` attribute is present:
foo.op foo_is_present
// When the `foo` attribute is not present:
foo.op foo_is_absent
```
Differential Revision: https://reviews.llvm.org/D99129
This nicely aligns the naming with RewritePatternSet. This type isn't
as widely used, but we keep a using declaration in to help with
downstream consumption of this change.
Differential Revision: https://reviews.llvm.org/D99131
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names. We'll keep the old names around for a
couple weeks to help transitions.
Differential Revision: https://reviews.llvm.org/D99127
This maintains the old name to have minimal source impact on downstream codes, and
does not do the huge mechanical patch. I expect the huge mechanical patch to land
sometime this week, but we can keep around the old names for a couple weeks to reduce
impact on downstream projects.
Differential Revision: https://reviews.llvm.org/D99119
This allows adding a C function pointer as a matchAndRewrite style pattern, which
is a very common case. This adopts it in ExpandTanh to show how it reduces a level
of nesting.
We could allow C++ lambdas here, but that doesn't work as well with type inference
in the common case. Instead of:
patterns.insert(convertTanhOp);
you need to specify:
patterns.insert<math::TanhOp>(convertTanhOp);
which is boilerplate'y. Capturing state like this is very uncommon, so we choose
to require clients to define their own structs and use the non-convenience method
when they need to do so.
Differential Revision: https://reviews.llvm.org/D99039
In the original structure, it will try to match CHECK-LABEL first then see if
the subsequent doesn't have the target strings. This is not what we are
expected. We are expecting the two functions which will be deleted should be
matched before CHECK-LABEL. Also fixed the function names.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D99060
Multiply-shift requires wider compute types or CPU specific code to avoid
premature truncation, apply_shift fixes this issue
Also, Tosa's mul op supports different input / output types. Added path that
sign-extends input values to int-32 values before multiplying.
Differential Revision: https://reviews.llvm.org/D99011
- Drop unnecessary occurrences of rewriter.eraseOp: dead linalg ops on tensors should be cleaned up by DCE.
- reimplement the part of Linalg on fusion that constructs the body and block arguments: the previous implementation had too much magic. Instead this spells out all cases explicitly and asserts / introduces TODOs for incorrect cases.
As a consequence, we can use the default traversal order for this pattern.
Differential Revision: https://reviews.llvm.org/D99070
GreedyPatternRewriteDriver was changed from bottom-up traversal to top-down traversal. Not all passes work yet with that change for traversal order. To give some time for fixing, add an option to allow to switch back to bottom-up traversal. Use this option in FusionOfTensorOpsPass which fails otherwise.
Differential Revision: https://reviews.llvm.org/D99059
mlir/lib/Dialect/Shape/IR/Shape.cpp:573:26: warning: loop variable 'shape' is always a copy because the range of type '::mlir::Operation::operand_range' (aka 'mlir::OperandRange') does not return a reference [-Wrange-loop-analysis]
for (const auto &shape : shapes()) {
^
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters. There are many many more to be removed.
Differential Revision: https://reviews.llvm.org/D99028
This reapplies b5d9a3c / https://reviews.llvm.org/D98609 with a one line fix in
processExistingConstants to skip() when erasing a constant we've already seen.
Original commit message:
1) Change the canonicalizer to walk the function in top-down order instead of
bottom-up order. This composes well with the "top down" nature of constant
folding and simplification, reducing iterations and re-evaluation of ops in
simple cases.
2) Explicitly enter existing constants into the OperationFolder table before
canonicalizing. Previously we would "constant fold" them and rematerialize
them, wastefully recreating a bunch fo constants, which lead to pointless
memory traffic.
Both changes together provide a 33% speedup for canonicalize on some mid-size
CIRCT examples.
One artifact of this change is that the constants generated in normal pattern
application get inserted at the top of the function as the patterns are applied.
Because of this, we get "inverted" constants more often, which is an aethetic
change to the IR but does permute some testcases.
Differential Revision: https://reviews.llvm.org/D99006
* Fold SelectOp when both true and false args are same SSA value
* Fold some cmp + select patterns
Differential Revision: https://reviews.llvm.org/D98576
* Do we need a threshold on maximum number of Yeild arguments processed (maximum number of SelectOp's to be generated)?
* Had to modify some old IfOp tests to not get optimized by this pattern
Differential Revision: https://reviews.llvm.org/D98592
This makes the annotation tied to the operand and the use of a keyword
more explicit/readable on what it means.
Differential Revision: https://reviews.llvm.org/D99001
* Moves this out of a test case where it was being developed to good effect and generalizes it.
* Having tried a number of things like this, I think this balances concerns reasonably well.
Differential Revision: https://reviews.llvm.org/D98989
Now that all of the builtin dialect is generated from ODS, its documentation in LangRef can be split out and replaced with references to Dialects/Builtin.md. LangRef is quite crusty right now and should really have a full cleanup done in a followup.
Differential Revision: https://reviews.llvm.org/D98562
This removes the need to construct an APInt for each value, given that it is guaranteed to contain 32 bit elements.
BEGIN_PUBLIC
...text exposed to open source public git repo...
END_PUBLIC
This allows for notifying callers when operations/blocks get erased, which is especially useful for the greedy pattern driver. The current greedy pattern driver "throws away" all information on constants in the operation folder because it doesn't know if they get erased or not. By passing in RewriterBase, we can directly track this and prevent the need for the pattern driver to rediscover all of the existing constants. In some situations this cuts the compile time of the canonicalizer in half.
Differential Revision: https://reviews.llvm.org/D98755
* IRModules.cpp -> (IRCore.cpp, IRAffine.cpp, IRAttributes.cpp, IRTypes.cpp).
* The individual pieces now compile in the 5-15s range whereas IRModules.cpp was starting to approach a minute (didn't capture a before time).
* More fine grained splitting is possible, but this represents the most obvious.
Differential Revision: https://reviews.llvm.org/D98978
Previously low benefit op-specific patterns never had a chance to match
even if high benefit op-agnostic pattern failed to match.
This was already fixed upstream, this commit just adds testscase
Differential Revision: https://reviews.llvm.org/D98513
Handles lowering from the tosa CastOp to the equivalent linalg lowering. It
includes support for interchange between bool, int, and floating point.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D98828
Adds lowerings for logical_* boolean operations. Each of these ops only operate
on booleans allowing simple lowerings.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D98910
* Makes the wrapped functions of the `@linalg_structured_op` decorator callable such that they emit IR imperatively when invoked.
* There are numerous TODOs that I will keep working through to achieve generality.
* Will true up exception handling tests as the feature progresses (for things that are actually errors once everything is implemented).
* Includes the addition of an `isinstance` method on concrete types in the Python API.
Differential Revision: https://reviews.llvm.org/D98754
This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.
I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D98447
When deleting operations in DCE, the algorithm uses a post-order walk of
the IR to ensure that value uses were erased before value defs. Graph
regions do not have the same structural invariants as SSA CFG, and this
post order walk could delete value defs before uses. This problem is
guaranteed to occur when there is a cycle in the use-def graph.
This change stops DCE from visiting the operations and blocks in any
meaningful order. Instead, we rely on explicitly dropping all uses of a
value before deleting it.
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D98919
Add extra `type.isa<FloatType>()` check to `FloatAttr::get(Type, double)` method.
Otherwise it tries to call `type.cast<FloatType>()`, which fails with assertion in Debug mode.
The `!type.isa<FloatType>()` case just redirercts the call to `FloatAttr::get(Type, APFloat)`,
which will perform the actual check and emit appropriate error.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98764
This adds a tosa.apply_scale operation that handles the scaling operation
common to quantized operatons. This scalar operation is lowered
in TosaToStandard.
We use a separate ApplyScale factorization as this is a replicable pattern
within TOSA. ApplyScale can be reused within pool/convolution/mul/matmul
for their quantized variants.
Tests are added to both tosa-to-standard and tosa-to-linalg-on-tensors
that verify each pass is correct.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D98753
Includes lowering for tosa.concat with indice computation with subtensor insert
operations. Includes tests along two different indices.
Differential Revision: https://reviews.llvm.org/D98813
This reverts commit 32a744ab20.
CI is broken:
test/Dialect/Linalg/bufferize.mlir:274:12: error: CHECK: expected string not found in input
// CHECK: %[[MEMREF:.*]] = tensor_to_memref %[[IN]] : memref<?xf32>
^
`BufferizeAnyLinalgOp` fails because `FillOp` is not a `LinalgGenericOp` and it fails while reading operand sizes attribute.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98671
Do not limit the number of arguments in rewriter pattern.
Introduce separate `FmtStrVecObject` class to handle
format of variadic `std::string` array.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D97839
This covers cases that are not folded away because the extent tensor type
becomes more concrete in the process.
Differential Revision: https://reviews.llvm.org/D98782
This has been a TODO for a while, and prevents breakages for attributes/types that contain floats that can't roundtrip outside of the hex format.
Differential Revision: https://reviews.llvm.org/D98808
This fixes broken JIT functionality on emulator platforms.
With Alex' recent movement towards squashing llvm ir dialects
into target specific dialects, we now must ensure these dialects
are registered to the cpu runner to ensure JIT can lower this
to proper LLVM IR before handing this off to the backend.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D98727
Add a feature to `EnumAttr` definition to generate
specialized Attribute class for the particular enumeration.
This class will inherit `StringAttr` or `IntegerAttr` and
will override `classof` and `getValue` methods.
With this class the enumeration predicate can be checked with simple
RTTI calls (`isa`, `dyn_cast`) and it will return the typed enumeration
directly instead of raw string/integer.
Based on the following discussion:
https://llvm.discourse.group/t/rfc-add-enum-attribute-decorator-class/2252
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97836
'ForOpIterArgsFolder' can now remove iterator arguments (and corresponding
results) with no use.
Example:
```
%cst = constant 32 : i32
%0:2 = scf.for %arg1 = %lb to %ub step %step iter_args(%arg2 = %arg0, %arg3 = %cst)
-> (i32, i32) {
%1 = addu %arg2, %cst : i32
scf.yield %1, %1 : i32, i32
}
use(%0#0)
```
%arg3 is not used in the block, and its corresponding result `%0#1` has no use,
thus remove the iter argument.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98711
Returning structs directly in LLVM does not necessarily align with the C ABI of
the platform. This might happen to work on Linux but for small structs this
breaks on Windows. With this change, the wrappers work platform independently.
Differential Revision: https://reviews.llvm.org/D98725
Added additional information about the SSA like properties
that has to be fulfilled in the bufferization steps.
Differential Revision: https://reviews.llvm.org/D95522
This commit fixes the lowering of `Affine.IfOp` to `SCF.IfOp` in the
presence of yield values. These changes have been made as a part of
`-lower-affine` pass.
Differential Revision: https://reviews.llvm.org/D98760
Some parameters to attributes and types rely on special comparison routines other than operator== to ensure equality. This revision adds support for those parameters by allowing them to specify a `comparator` code block that determines if `$_lhs` and `$_rhs` are equal. An example of one of these paramters is APFloat, which requires `bitwiseIsEqual` for bitwise comparison (which we want for attribute equality).
Differential Revision: https://reviews.llvm.org/D98473
Supporting ranges in the byte code requires additional complexity, given that a range can't be easily representable as an opaque void *, as is possible with the existing bytecode value types (Attribute, Type, Value, etc.). To enable representing a range with void *, an auxillary storage is used for the actual range itself, with the pointer being passed around in the normal byte code memory. For type ranges, a TypeRange is stored. For value ranges, a ValueRange is stored. The above problem represents a majority of the complexity involved in this revision, the rest is adapting/adding byte code operations to support the changes made to the PDL interpreter in the parent revision.
After this revision, PDL will have initial end-to-end support for variadic operands/results.
Differential Revision: https://reviews.llvm.org/D95723
This revision extends the PDL Interpreter dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl_interp.check_types : Compare a range of types with a known range.
* pdl_interp.create_types : Create a constant range of types.
* pdl_interp.get_operands : Get a range of operands from an operation.
* pdl_interp.get_results : Get a range of results from an operation.
* pdl_interp.switch_types : Switch on a range of types.
This revision handles adding support in the interpreter dialect and the conversion from PDL to PDLInterp. Support for variadic operands and results in the bytecode will be added in a followup revision.
Differential Revision: https://reviews.llvm.org/D95722
This revision extends the PDL dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl.operands : Define a range of input operands.
* pdl.results : Extract a result group from an operation.
* pdl.types : Define a handle to a range of types.
Support for these in the pdl interpreter dialect and byte code will be added in followup revisions.
Differential Revision: https://reviews.llvm.org/D95721
This has a numerous amount of benefits, given the overly clunky nature of CreateNativeOp:
* Users can now call into arbitrary rewrite functions from inside of PDL, allowing for more natural interleaving of PDL/C++ and enabling for more of the pattern to be in PDL.
* Removes the need for an additional set of C++ functions/registry/etc. The new ApplyNativeRewriteOp will use the same PDLRewriteFunction as the existing RewriteOp. This reduces the API surface area exposed to users.
This revision also introduces a new PDLResultList class. This class is used to provide results of native rewrite functions back to PDL. We introduce a new class instead of using a SmallVector to simplify the work necessary for variadics, given that ranges will require some changes to the structure of PDLValue.
Differential Revision: https://reviews.llvm.org/D95720
Up until now, results have been represented as additional results to a pdl.operation. This is fairly clunky, as it mismatches the representation of the rest of the IR constructs(e.g. pdl.operand) and also isn't a viable representation for operations returned by pdl.create_native. This representation also creates much more difficult problems when factoring in support for variadic result groups, optional results, etc. To resolve some of these problems, and simplify adding support for variable length results, this revision extracts the representation for results out of pdl.operation in the form of a new `pdl.result` operation. This operation returns the result of an operation at a given index, e.g.:
```
%root = pdl.operation ...
%result = pdl.result 0 of %root
```
Differential Revision: https://reviews.llvm.org/D95719
This adds a new integration test. However, it also
adapts to a recent memref.XXX change for existing tests
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D98680
Lit as it exists today has three hacks that allow users to run tests earlier:
1) An entire test suite can set the `is_early` boolean.
2) A very recently introduced "early_tests" feature.
3) The `--incremental` flag forces failing tests to run first.
All of these approaches have problems.
1) The `is_early` feature was until very recently undocumented. Nevertheless it still lacks testing and is a imprecise way of optimizing test starting times.
2) The `early_tests` feature requires manual updates and doesn't scale.
3) `--incremental` is undocumented, untested, and it requires modifying the *source* file system by "touching" the file. This "touch" based approach is arguably a hack because it confuses editors (because it looks like the test was modified behind the back of the editor) and "touching" the test source file doesn't work if the test suite is read only from the perspective of `lit` (via advanced filesystem/build tricks).
This patch attempts to simplify and address all of the above problems.
This patch formalizes, documents, tests, and defaults lit to recording the execution time of tests and then reordering all tests during the next execution. By reordering the tests, high core count machines run faster, sometimes significantly so.
This patch also always runs failing tests first, which is a positive user experience win for those that didn't know about the hidden `--incremental` flag.
Finally, if users want, they can _optionally_ commit the test timing data (or a subset thereof) back to the repository to accelerate bots and first-time runs of the test suite.
Reviewed By: jhenderson, yln
Differential Revision: https://reviews.llvm.org/D98179
Enhance 'ForOpIterArgsFolder' to remove unused iteration arguments in a
scf::ForOp. If the block argument corresponding to the given iterator has no
use and the yielded value equals the input, we fold it away.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98503
The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98470