This is intended to ease the transition for client with a lot of
dependencies. It'll be removed in the coming weeks.
Differential Revision: https://reviews.llvm.org/D86755
The PDL Interpreter dialect provides a lower level abstraction compared to the PDL dialect, and is targeted towards low level optimization and interpreter code generation. The dialect operations encapsulates low-level pattern match and rewrite "primitives", such as navigating the IR (Operation::getOperand), creating new operations (OpBuilder::create), etc. Many of the operations within this dialect also fuse branching control flow with some form of a predicate comparison operation. This type of fusion reduces the amount of work that an interpreter must do when executing.
An example of this representation is shown below:
```mlir
// The following high level PDL pattern:
pdl.pattern : benefit(1) {
%resultType = pdl.type
%inputOperand = pdl.input
%root, %results = pdl.operation "foo.op"(%inputOperand) -> %resultType
pdl.rewrite %root {
pdl.replace %root with (%inputOperand)
}
}
// May be represented in the interpreter dialect as follows:
module {
func @matcher(%arg0: !pdl.operation) {
pdl_interp.check_operation_name of %arg0 is "foo.op" -> ^bb2, ^bb1
^bb1:
pdl_interp.return
^bb2:
pdl_interp.check_operand_count of %arg0 is 1 -> ^bb3, ^bb1
^bb3:
pdl_interp.check_result_count of %arg0 is 1 -> ^bb4, ^bb1
^bb4:
%0 = pdl_interp.get_operand 0 of %arg0
pdl_interp.is_not_null %0 : !pdl.value -> ^bb5, ^bb1
^bb5:
%1 = pdl_interp.get_result 0 of %arg0
pdl_interp.is_not_null %1 : !pdl.value -> ^bb6, ^bb1
^bb6:
pdl_interp.record_match @rewriters::@rewriter(%0, %arg0 : !pdl.value, !pdl.operation) : benefit(1), loc([%arg0]), root("foo.op") -> ^bb1
}
module @rewriters {
func @rewriter(%arg0: !pdl.value, %arg1: !pdl.operation) {
pdl_interp.replace %arg1 with(%arg0)
pdl_interp.return
}
}
}
```
Differential Revision: https://reviews.llvm.org/D84579
- This utility to merge a block anywhere into another one can help inline single
block regions into other blocks.
- Modified patterns test to use the new function.
Differential Revision: https://reviews.llvm.org/D86251
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
registry.insert<mlir::standalone::StandaloneDialect>();
registry.insert<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
Differential Revision: https://reviews.llvm.org/D85622
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
registry.insert<mlir::standalone::StandaloneDialect>();
registry.insert<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
Differential Revision: https://reviews.llvm.org/D85622
This greatly simplifies a large portion of the underlying infrastructure, allows for lookups of singleton classes to be much more efficient and always thread-safe(no locking). As a result of this, the dialect symbol registry has been removed as it is no longer necessary.
For users broken by this change, an alert was sent out(https://llvm.discourse.group/t/removing-kinds-from-attributes-and-types) that helps prevent a majority of the breakage surface area. All that should be necessary, if the advice in that alert was followed, is removing the kind passed to the ::get methods.
Differential Revision: https://reviews.llvm.org/D86121
This changes the behavior of constructing MLIRContext to no longer load globally
registered dialects on construction. Instead Dialects are only loaded explicitly
on demand:
- the Parser is lazily loading Dialects in the context as it encounters them
during parsing. This is the only purpose for registering dialects and not load
them in the context.
- Passes are expected to declare the dialects they will create entity from
(Operations, Attributes, or Types), and the PassManager is loading Dialects into
the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only
need to load the dialect for the IR it will emit, and the optimizer is
self-contained and load the required Dialects. For example in the Toy tutorial,
the compiler only needs to load the Toy dialect in the Context, all the others
(linalg, affine, std, LLVM, ...) are automatically loaded depending on the
optimization pipeline enabled.
To adjust to this change, stop using the existing dialect registration: the
global registry will be removed soon.
1) For passes, you need to override the method:
virtual void getDependentDialects(DialectRegistry ®istry) const {}
and registery on the provided registry any dialect that this pass can produce.
Passes defined in TableGen can provide this list in the dependentDialects list
field.
2) For dialects, on construction you can register dependent dialects using the
provided MLIRContext: `context.getOrLoadDialect<DialectName>()`
This is useful if a dialect may canonicalize or have interfaces involving
another dialect.
3) For loading IR, dialect that can be in the input file must be explicitly
registered with the context. `MlirOptMain()` is taking an explicit registry for
this purpose. See how the standalone-opt.cpp example is setup:
mlir::DialectRegistry registry;
mlir::registerDialect<mlir::standalone::StandaloneDialect>();
mlir::registerDialect<mlir::StandardOpsDialect>();
Only operations from these two dialects can be in the input file. To include all
of the dialects in MLIR Core, you can populate the registry this way:
mlir::registerAllDialects(registry);
4) For `mlir-translate` callback, as well as frontend, Dialects can be loaded in
the context before emitting the IR: context.getOrLoadDialect<ToyDialect>()
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
Differential Revision: https://reviews.llvm.org/D85622
It appears in this case that an implicit cast from StringRef to std::string
doesn't happen. Fixed with an explicit cast.
Differential Revision: https://reviews.llvm.org/D85986
This changes the behavior of constructing MLIRContext to no longer load globally registered dialects on construction. Instead Dialects are only loaded explicitly on demand:
- the Parser is lazily loading Dialects in the context as it encounters them during parsing. This is the only purpose for registering dialects and not load them in the context.
- Passes are expected to declare the dialects they will create entity from (Operations, Attributes, or Types), and the PassManager is loading Dialects into the Context when starting a pipeline.
This changes simplifies the configuration of the registration: a compiler only need to load the dialect for the IR it will emit, and the optimizer is self-contained and load the required Dialects. For example in the Toy tutorial, the compiler only needs to load the Toy dialect in the Context, all the others (linalg, affine, std, LLVM, ...) are automatically loaded depending on the optimization pipeline enabled.
This revision removes all of the lingering usages of Type::getKind. A consequence of this is that FloatType is now split into 4 derived types that represent each of the possible float types(BFloat16Type, Float16Type, Float32Type, and Float64Type). Other than this split, this revision is NFC.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D85566
These hooks were introduced before the Interfaces mechanism was available.
DialectExtractElementHook is unused and entirely removed. The
DialectConstantFoldHook is used a fallback in the
operation fold() method, and is replaced by a DialectInterface.
The DialectConstantDecodeHook is used for interpreting OpaqueAttribute
and should be revamped, but is replaced with an interface in 1:1 fashion
for now.
Differential Revision: https://reviews.llvm.org/D85595
This reverts commit 9f24640b7e.
We hit some dead-locks on thread exit in some configurations: TLS exit handler is taking a lock.
Temporarily reverting this change as we're debugging what is going on.
This also beefs up the test coverage:
- Make unranked memref testing consistent with ranked memrefs.
- Add testing for the invalid element type cases.
This is not quite NFC: index types are now allowed in unranked memrefs.
Differential Revision: https://reviews.llvm.org/D85541
This revision refactors the default definition of the attribute and type `classof` methods to use the TypeID of the concrete class instead of invoking the `kindof` method. The TypeID is already used as part of uniquing, and this allows for removing the need for users to define any of the type casting utilities themselves.
Differential Revision: https://reviews.llvm.org/D85356
Subclass data is useful when a certain amount of memory is allocated, but not all of it is used. In the case of Type, that hasn't been the case for a while and the subclass is just taking up a full `unsigned`. Removing this frees up ~8 bytes for almost every type instance.
Differential Revision: https://reviews.llvm.org/D85348
This class allows for defining thread local objects that have a set non-static lifetime. This internals of the cache use a static thread_local map between the various different non-static objects and the desired value type. When a non-static object destructs, it simply nulls out the entry in the static map. This will leave an entry in the map, but erase any of the data for the associated value. The current use cases for this are in the MLIRContext, meaning that the number of items in the static map is ~1-2 which aren't particularly costly enough to warrant the complexity of pruning. If a use case arises that requires pruning of the map, the functionality can be added.
This is especially useful in the context of MLIR for implementing thread-local caching of context level objects that would otherwise have very high lock contention. This revision adds a thread local cache in the MLIRContext for attributes, identifiers, and types to reduce some of the locking burden. This led to a speedup of several hundred miliseconds when compiling a conversion pass on a very large mlir module(>300K operations).
Differential Revision: https://reviews.llvm.org/D82597
This allows for bucketing the different possible storage types, with each bucket having its own allocator/mutex/instance map. This greatly reduces the amount of lock contention when multi-threading is enabled. On some non-trivial .mlir modules (>300K operations), this led to a compile time decrease of a single conversion pass by around half a second(>25%).
Differential Revision: https://reviews.llvm.org/D82596
This revision adds a folding pattern to replace affine.min ops by the actual min value, when it can be determined statically from the strides and bounds of enclosing scf loop .
This matches the type of expressions that Linalg produces during tiling and simplifies boundary checks. For now Linalg depends both on Affine and SCF but they do not depend on each other, so the pattern is added there.
In the future this will move to a more appropriate place when it is determined.
The canonicalization of AffineMinOp operations in the context of enclosing scf.for and scf.parallel proceeds by:
1. building an affine map where uses of the induction variable of a loop
are replaced by `%lb + %step * floordiv(%iv - %lb, %step)` expressions.
2. checking if any of the results of this affine map divides all the other
results (in which case it is also guaranteed to be the min).
3. replacing the AffineMinOp by the result of (2).
The algorithm is functional in simple parametric tiling cases by using semi-affine maps. However simplifications of such semi-affine maps are not yet available and the canonicalization does not succeed yet.
Differential Revision: https://reviews.llvm.org/D82009
This patch moves the registration to a method in the MLIRContext: getOrCreateDialect<ConcreteDialect>()
This method requires dialect to provide a static getDialectNamespace()
and store a TypeID on the Dialect itself, which allows to lazyily
create a dialect when not yet loaded in the context.
As a side effect, it means that duplicated registration of the same
dialect is not an issue anymore.
To limit the boilerplate, TableGen dialect generation is modified to
emit the constructor entirely and invoke separately a "init()" method
that the user implements.
Differential Revision: https://reviews.llvm.org/D85495
When any of the memrefs in a structured linalg op has a zero dimension, it becomes dead.
This is consistent with the fact that linalg ops deduce their loop bounds from their operands.
Note however that this is not the case for the `tensor<0xelt_type>` which is a special convention
that must be lowered away into either `memref<elt_type>` or just `elt_type` before this
canonicalization can kick in.
Differential Revision: https://reviews.llvm.org/D85413
- Moved TypeRange into its own header/cpp file, and add hashing support.
- Change FunctionType::get() and TupleType::get() to use TypeRange
Differential Revision: https://reviews.llvm.org/D85075
Simplify semi-affine expression for the operations like ceildiv,
floordiv and modulo by any given symbol by checking divisibilty by that
symbol.
Some properties used in simplification are:
1) Commutative property of the floordiv and ceildiv:
((expr1 floordiv expr2) floordiv expr3 ) = ((expr1 floordiv expr3) floordiv expr2)
((expr1 ceildiv expr2) ceildiv expr3 ) = ((expr1 ceildiv expr3) ceildiv expr2)
While simplification if operations are different no simplification is
possible as there is no property that simplify expressions like these:
((expr1 ceildiv expr2) floordiv expr3) or ((expr1 floordiv expr2)
ceildiv expr3).
2) If both expr1 and expr2 are divisible by the expr3 then:
(expr1 % expr2) / expr3 = ((expr1 / expr3) % (expr2 / expr3))
where / is divide symbol.
3) If expr1 is divisible by expr2 then expr1 % expr2 = 0.
Signed-off-by: Yash Jain <yash.jain@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D84920
- Add getArgumentTypes() to Region (missed from before)
- Adopt Region argument API in `hasMultiplyAddBody`
- Fix 2 typos in comments
Differential Revision: https://reviews.llvm.org/D84807
This revision adds support for much deeper type conversion integration into the conversion process, and enables auto-generating cast operations when necessary. Type conversions are now largely automatically managed by the conversion infra when using a ConversionPattern with a provided TypeConverter. This removes the need for patterns to do type cast wrapping themselves and moves the burden to the infra. This makes it much easier to perform partial lowerings when type conversions are involved, as any lingering type conversions will be automatically resolved/legalized by the conversion infra.
To support this new integration, a few changes have been made to the type materialization API on TypeConverter. Materialization has been split into three separate categories:
* Argument Materialization: This type of materialization is used when converting the type of block arguments when calling `convertRegionTypes`. This is useful for contextually inserting additional conversion operations when converting a block argument type, such as when converting the types of a function signature.
* Source Materialization: This type of materialization is used to convert a legal type of the converter into a non-legal type, generally a source type. This may be called when uses of a non-legal type persist after the conversion process has finished.
* Target Materialization: This type of materialization is used to convert a non-legal, or source, type into a legal, or target, type. This type of materialization is used when applying a pattern on an operation, but the types of the operands have not yet been converted.
Differential Revision: https://reviews.llvm.org/D82831
This commit adds functionality needed for implementation of convolutions with
linalg.generic op. Since linalg.generic right now expects indexing maps to be
just permutations, offset indexing needed in convolutions is not possible.
Therefore in this commit we address the issue by adding support for symbols inside
indexing maps which enables more advanced indexing. The upcoming commit will
solve the problem of computing loop bounds from such maps.
Differential Revision: https://reviews.llvm.org/D83158
Some dialects have semantics which is not well represented by common
SSA structures with dominance constraints. This patch allows
operations to declare the 'kind' of their contained regions.
Currently, two kinds are allowed: "SSACFG" and "Graph". The only
difference between them at the moment is that SSACFG regions are
required to have dominance, while Graph regions are not required to
have dominance. The intention is that this Interface would be
generated by ODS for existing operations, although this has not yet
been implemented. Presumably, if someone were interested in code
generation, we might also have a "CFG" dialect, which defines control
flow, but does not require SSA.
The new behavior is mostly identical to the previous behavior, since
registered operations without a RegionKindInterface are assumed to
contain SSACFG regions. However, the behavior has changed for
unregistered operations. Previously, these were checked for
dominance, however the new behavior allows dominance violations, in
order to allow the processing of unregistered dialects with Graph
regions. One implication of this is that regions in unregistered
operations with more than one op are no longer CSE'd (since it
requires dominance info).
I've also reorganized the LangRef documentation to remove assertions
about "sequential execution", "SSA Values", and "Dominance". Instead,
the core IR is simply "ordered" (i.e. totally ordered) and consists of
"Values". I've also clarified some things about how control flow
passes between blocks in an SSACFG region. Control Flow must enter a
region at the entry block and follow terminator operation successors
or be returned to the containing op. Graph regions do not define a
notion of control flow.
see discussion here:
https://llvm.discourse.group/t/rfc-allowing-dialects-to-relax-the-ssa-dominance-condition/833/53
Differential Revision: https://reviews.llvm.org/D80358
- Arguments of the first block of a region are considered region arguments.
- Add API on Region class to deal with these arguments directly instead of
using the front() block.
- Changed several instances of existing code that can use this API
- Fixes https://bugs.llvm.org/show_bug.cgi?id=46535
Differential Revision: https://reviews.llvm.org/D83599
This revision folds vector.transfer operations by updating the `masked` bool array attribute when more unmasked dimensions can be discovered.
Differential revision: https://reviews.llvm.org/D83586
TransposeOp are often followed by ExtractOp.
In certain cases however, it is unnecessary (and even detrimental) to lower a TransposeOp to either a flat transpose (llvm.matrix intrinsics) or to unrolled scalar insert / extract chains.
Providing foldings of ExtractOp mitigates some of the unnecessary complexity.
Differential revision: https://reviews.llvm.org/D83487
Depending on where the 0 dimension is within the shape, the parser will currently reject .mlir generated by the printer.
Differential Revision: https://reviews.llvm.org/D83445
- This will eliminate the need to pass an empty `ArrayRef<NamedAttribute>{}` when
no named attributes are required on the function.
Differential Revision: https://reviews.llvm.org/D83356
This revision adds foldings for ExtractOp operations that come from previous InsertOp.
InsertOp have cumulative semantic where multiple chained inserts are necessary to produce the final value from which the extracts are obtained.
Additionally, TransposeOp may be interleaved and need to be tracked in order to follow the producer consumer relationships and properly compute positions.
Differential revision: https://reviews.llvm.org/D83150