Depends On D89958
1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values
2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop.
Example:
```
scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) {
"do_some_compute"(%i, %j): () -> ()
}
```
Converted to:
```
%c0 = constant 0 : index
%c1 = constant 1 : index
// Compute blocks sizes for each induction variable.
%num_blocks_i = ... : index
%num_blocks_j = ... : index
%block_size_i = ... : index
%block_size_j = ... : index
// Create an async group to track async execute ops.
%group = async.create_group
scf.for %bi = %c0 to %num_blocks_i step %c1 {
%block_start_i = ... : index
%block_end_i = ... : index
scf.for %bj = %c0 t0 %num_blocks_j step %c1 {
%block_start_j = ... : index
%block_end_j = ... : index
// Execute the body of original parallel operation for the current
// block.
%token = async.execute {
scf.for %i = %block_start_i to %block_end_i step %si {
scf.for %j = %block_start_j to %block_end_j step %sj {
"do_some_compute"(%i, %j): () -> ()
}
}
}
// Add produced async token to the group.
async.add_to_group %token, %group
}
}
// Await completion of all async.execute operations.
async.await_all %group
```
In this example outer loop launches inner block level loops as separate async
execute operations which will be executed concurrently.
At the end it waits for the completiom of all async execute operations.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89963
The index type does not have a bitsize and hence the size of corresponding allocations cannot be computed. Instead, the promotion pass now has an explicit option to specify the size of index.
Differential Revision: https://reviews.llvm.org/D91360
This exposes a hook to configure legality of operations such that only
`scf.parallel` operations that have mapping attributes are marked as
illegal. Consequently, the transformation can now also be applied to
mixed forms.
Differential Revision: https://reviews.llvm.org/D91340
A recent refactoring removed the need to interleave verifier passes and instead opted to verify during the normal execution of passes instead. As such, the old verify pass is no longer necessary and can be removed.
Differential Revision: https://reviews.llvm.org/D91212
This revision adds support in the parser/printer for "deferrable" aliases, i.e. those that can be resolved after printing has finished. This allows for printing aliases for operation locations after the module instead of before, i.e. this is now supported:
```
"foo.op"() : () -> () loc(#loc)
#loc = loc("some_location")
```
Differential Revision: https://reviews.llvm.org/D91227
This removes the need to have an explicit `cast<>` given that we always know it `isa` instance of the interface.
Differential Revision: https://reviews.llvm.org/D91304
Some users have native c++ data types that correspond to floating point values stored within a DenseElementsAttr that do not have a corresponding native C++ data type(e.g. bfloat16/half/etc). This revision allows for such users to use those native types directly, and removes the need to go through APFloat when the much faster native value path is available.
Differential Revision: https://reviews.llvm.org/D91402
07f1047f41 changed the CMake detection to use find_package(Python3 ...
but didn't update the lit configuration to use the expected Python3_EXECUTABLE
cmake variable to point to the interpreter path.
This resulted in an empty path on MacOS.
- Move isSupportedMemRefType() to ConvertToLLVMPatterns and check if the
memref element type is supported there.
Differential Revision: https://reviews.llvm.org/D91374
The previous code defined it as allocating a new memref for its result.
However, this is not how it is treated by the dialect conversion framework,
that does the equivalent of inserting and folding it away internally
(even independent of any canonicalization patterns that we have
defined).
The semantics as they were previously written were also very
constraining: Nontrivial analysis is needed to prove that the new
allocation isn't needed for correctness (e.g. to avoid aliasing).
By removing those semantics, we avoid losing that information.
Differential Revision: https://reviews.llvm.org/D91382
We lower them to a std.global_memref (uniqued by constant value) + a
std.get_global_memref to produce the corresponding memref value.
This allows removing Linalg's somewhat hacky lowering of tensor
constants, now that std properly supports this.
Differential Revision: https://reviews.llvm.org/D91306
It was incorrect in the presence of a tensor argument with multiple
uses.
The bufferization of subtensor_insert was writing into a converted
memref operand, but there is no guarantee that the converted memref for
that operand is safe to write into. In this case, the same converted
memref is written to in-place by the subtensor_insert bufferization,
violating the tensor-level semantics.
I left some comments in a TODO about ways forward on this. I will be
working actively on this problem in the coming days.
Differential Revision: https://reviews.llvm.org/D91371
- Remove the default valued arguments from these functions.
- Besides FuncOp, looks like no other in-tree op is using these functions.
Differential Revision: https://reviews.llvm.org/D91369
The tokens are already handled by the lexer. This revision exposes them
through the parser interface.
This revision also adds missing functions for question mark parsing and
completes the list of valid punctuation tokens in the documentation.
Differential Revision: https://reviews.llvm.org/D90907
Add an ODS-backed generator of default builders. This currently does not
support operation with attribute arguments, for which the builder is
just ignored. Attribute support will be introduced separately for
builders and accessors.
Default builders are always generated with the same number of result and
operand groups as the ODS specification, i.e. one group per each operand
or result. Optional elements accept None but cannot be omitted. Variadic
groups accept iterable objects and cannot be replaced with a single
object.
For some operations, it is possible to infer the result type given the
traits, but most traits rely on inline pieces of C++ that we cannot
(yet) forward to Python bindings. Since the Ops where the inference is
possible (having the `SameOperandAndResultTypes` trait or
`TypeMatchesWith` without transform field) are a small minority, they
also require the result type to make the builder syntax more consistent.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D91190
Added documentation about the bufferization features.
Furthermore, the usage of pre- and post-processing is described.
This also includes information about optimization functionalities.
Differential Revision: https://reviews.llvm.org/D90675
This change does two main things
1) An operation might have multiple dependences to the same
producer. Not tracking them correctly can result in incorrect code
generation with fusion. To rectify this the dependence tracking
needs to also have the operand number in the consumer.
2) Improve the logic used to find the fused loops making it easier to
follow. The only constraint for fusion is that linalg ops (on
buffers) have update semantics for the result. Fusion should be
such that only one iteration of the fused loop (which is also a
tiled loop) must touch only one (disjoint) tile of the output. This
could be relaxed by allowing for recomputation that is the default
when oeprands are tensors, or can be made legal with promotion of
the fused view (in future).
Differential Revision: https://reviews.llvm.org/D90579
Exposing the C versions of the methods of the sparse runtime support lib
through header files will enable using the same methods in an MLIR program
as well as a C++ program, which will simplify future benchmarking comparisons
(e.g. comparing MLIR generated code with eigen for Matrix Market sparse matrices).
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91316
This CL integrates the new sparse annotations (hereto merely added as fully
transparent attributes) more tightly to the generic linalg op in order to add
verification of the annotations' consistency as well as to make make other
passes more aware of their presence (in the long run, rewriting rules must
preserve the integrity of the annotations).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D91224
Previous the textual form of the pass pipeline would implicitly nest,
instead we opt for the explicit form here: this has less surprise.
This also avoids asserting in the bindings when passing a pass pipeline
with incorrect nesting.
Differential Revision: https://reviews.llvm.org/D91233
If block A and B are in different regions and region of A is not an ancestor of
B, either A is included in region of B or the two regions are disjoint. In both
case A doesn't post-dominate B.
Differential Revision: https://reviews.llvm.org/D91225
The MLIR_ASYNCRUNTIME_EXPORT macro was being defined to be either
__declspec(dllexport) or __declspec(dllimport), depending on whether
mlir_c_runner_utils_EXPORTS is defined. The latter was a copy/paste
error and should have been mlir_async_runtime_EXPORTS.
Additionally, the uses of that macro in the .cpp file were unnecessary,
as only function declarations need to be exported, not their definitions.
Differential Revision: https://reviews.llvm.org/D91196
The previous logic for inlining a region A with N blocks into region B
would produce incorrect results on rollback for N greater than 1. This
rollback logic would leave blocks 1..N in region B and only move block 0
to region A.
The new inlining action recording stores the block move actions from N-1
to 0. Now on roll back, block 0 is moved to region A and then 1..N is
appended to the list of blocks in region A.
Differential Revision: https://reviews.llvm.org/D91185
This adds getters for `llvm.align` and `llvm.noalias` strings that are used
as attribute names in the llvm dialect.
Differential Revision: https://reviews.llvm.org/D91166
I would like to use this for D90589 to switch std.alloc to assemblyFormat.
Hopefully it will be useful in other places as well.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D91068
This patch converts elementwise ops on tensors to linalg.generic ops
with the same elementwise op in the payload (except rewritten to
operate on scalars, obviously). This is a great form for later fusion to
clean up.
E.g.
```
// Compute: %arg0 + %arg1 - %arg2
func @f(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {
%0 = addf %arg0, %arg1 : tensor<?xf32>
%1 = subf %0, %arg2 : tensor<?xf32>
return %1 : tensor<?xf32>
}
```
Running this through
`mlir-opt -convert-std-to-linalg -linalg-fusion-for-tensor-ops` we get:
```
func @f(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {
%0 = linalg.generic {indexing_maps = [#map0, #map0, #map0, #map0], iterator_types = ["parallel"]} ins(%arg0, %arg1, %arg2 : tensor<?xf32>, tensor<?xf32>, tensor<?xf32>) {
^bb0(%arg3: f32, %arg4: f32, %arg5: f32): // no predecessors
%1 = addf %arg3, %arg4 : f32
%2 = subf %1, %arg5 : f32
linalg.yield %2 : f32
} -> tensor<?xf32>
return %0 : tensor<?xf32>
}
```
So the elementwise ops on tensors have nicely collapsed into a single
linalg.generic, which is the form we want for further transformations.
Differential Revision: https://reviews.llvm.org/D90354
This patch adds an `ElementwiseMappable` trait as discussed in the RFC
here:
https://llvm.discourse.group/t/rfc-std-elementwise-ops-on-tensors/2113/23
This trait can power a number of transformations and analyses.
A subsequent patch adds a convert-elementwise-to-linalg pass exhibits
how this trait allows writing generic transformations.
See https://reviews.llvm.org/D90354 for that patch.
This trait slightly changes some verifier messages, but the diagnostics
are usually about as good. I fiddled with the ordering of the trait in
the .td file trait lists to minimize the changes here.
Differential Revision: https://reviews.llvm.org/D90731
This only exposes the ability to round-trip a textual pipeline at the
moment.
To exercise it, we also bind the libTransforms in a new Python extension. This
does not include any interesting bindings, but it includes all the
mechanism to add separate native extensions and load them dynamically.
As such passes in libTransforms are only registered after `import
mlir.transforms`.
To support this global registration, the TableGen backend is also
extended to bind to the C API the group registration for passes.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D90819
This patch introduces a new conversion pattern for `spv.ExecutionMode`.
`spv.ExecutionMode` may contain important information about the entry
point, which we want to preserve. For example, `LocalSize` provides
information about the work-group size that can be reused. Hence, the
pattern for entry-point ops changes to the following:
- `spv.EntryPoint` is still simply removed
- Info from `spv.ExecutionMode` is used to create a global struct variable,
which looks like:
```
struct {
int32_t executionMode;
int32_t values[]; // optional values
};
```
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D89989
Introduce an ODS/Tablegen backend producing Op wrappers for Python bindings
based on the ODS operation definition. Usage:
mlir-tblgen -gen-python-op-bindings -Iinclude <path/to/Ops.td> \
-bind-dialect=<dialect-name>
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D90960
Slicing, that is element access with `[being🔚step]` structure, is
a common Python idiom for sequence-like containers. It is also necessary
to support custom accessor for operations with variadic operands and
results (an operation an return a slice of its operands that correspond
to the given variadic group).
Add generic utility to support slicing in Python bindings and use it
for operation operands and results.
Depends On D90923
Reviewed By: stellaraccident, mehdi_amini
Differential Revision: https://reviews.llvm.org/D90936
VectorInsertDynamicOp in SPIRV dialect
conversion from vector.insertelement to spirv VectorInsertDynamicOp
Differential Revision: https://reviews.llvm.org/D90927
Locations often get very long and clutter up operations when printed inline with them. This revision adds support for using aliases with trailing operation locations, and makes printing with aliases the default behavior. Aliases in the trailing location take the form `loc(<alias>)`, such as `loc(#loc0)`. As with all aliases, using `mlir-print-local-scope` can be used to disable them and get the inline behavior.
Differential Revision: https://reviews.llvm.org/D90652
This revision refactors the way that attributes/types are considered when generating aliases. Instead of considering all of the attributes/types of every operation, we perform a "fake" print step that prints the operations using a dummy printer to collect the attributes and types that would actually be printed during the real process. This removes a lot of attributes/types from consideration that generally won't end up in the final output, e.g. affine map attributes in an `affine.apply`/`affine.for`.
This resolves a long standing TODO w.r.t aliases, and helps to have a much cleaner textual output format. As a datapoint to the latter, as part of this change several tests were identified as testing for the presence of attributes aliases that weren't actually referenced by the custom form of any operation.
To ensure that this wouldn't cause a large degradation in compile time due to the second full print, I benchmarked this change on a very large module with a lot of operations(The file is ~673M/~4.7 million lines long). This file before this change take ~6.9 seconds to print in the custom form, and ~7 seconds after this change. In the custom assembly case, this added an average of a little over ~100 miliseconds to the compile time. This increase was due to the way that argument attributes on functions are structured and how they get printed; i.e. with a better representation the negative impact here can be greatly decreased. When printing in the generic form, this revision had no observable impact on the compile time. This benchmarking leads me to believe that the impact of this change on compile time w.r.t printing is closely related to `print` methods that perform a lot of additional/complex processing outside of the OpAsmPrinter.
Differential Revision: https://reviews.llvm.org/D90512
For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.
This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.
I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.
Reviewed By: mehdi_amini, fghanim
Differential Revision: https://reviews.llvm.org/D91109
This allows us to omit one level of indirection when querying
the information from the underlying attribute.
Reviewed By: hanchung, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D91080
The pass combines patterns of ExpandAtomic, ExpandMemRefReshape,
StdExpandDivs passes. The pass is meant to legalize STD for conversion to LLVM.
Differential Revision: https://reviews.llvm.org/D91082
- Change syntax for FuncOp to be `func <visibility>? @name` instead of printing the
visibility in the attribute dictionary.
- Since printFunctionLikeOp() and parseFunctionLikeOp() are also used by other
operations, make the "inline visibility" an opt-in feature.
- Updated unit test to use and check the new syntax.
Differential Revision: https://reviews.llvm.org/D90859
- Convert `global_memref` to LLVM::GlobalOp.
- Convert `get_global_memref` to a memref descriptor with a pointer to the first element
of the global stashed in it.
- Extend unit test and a mlir-cpu-runner test to validate the generated LLVM IR.
Differential Revision: https://reviews.llvm.org/D90803
- When a block is not empty and does not end with a terminator, flag the error on the
last operation of the block instead of the start of the block.
Differential Revision: https://reviews.llvm.org/D90988
The legalization did not forward the listener which prevents dynamic
legalization and prevents rollbacks. This handled that and then changed
the associated pass to support all other std ops to support partial
conversion.
Previously, this lowering was failing, but due to the
initial bug, the op's modifications were not reverted, and thus the
pattern matching succeeded.
Differential Revision: https://reviews.llvm.org/D91079
Enumerating elements in these classes is necessary to enable custom
operand accessors for variadic operands.
Depends On D90919
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D90923
Operations in a MLIR have a dictionary of attributes attached. Expose
those to Python bindings through a pseudo-container that can be indexed
either by attribute name, producing a PyAttribute, or by a contiguous
index for enumeration purposes, producing a PyNamedAttribute.
Depends On D90917
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D90919
* Wires them in the same way that peer-dialect test passes are registered.
* Fixes the build for -DLLVM_INCLUDE_TESTS=OFF.
Differential Revision: https://reviews.llvm.org/D91022
Since SPIR-V module has an optional name, this patch
makes a change to pass it to `ModuleOp` during conversion.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D90904
The tests are intended to exercise the public C API and will link to a
specific shared library exposing only the C API, this library itself may
link to libMLIR.so.
If we link some LLVM library statically in the test themselves, we end
up with duplicated cl::opt registrations in LLVM. A possible setup if
these libraries were needed could be to link libMLIR.so directly when
available and link statically when it isn't available (in which case the
libary exposing the C API would be statically link and isolated from the
cl::opt registry, hopefully).
Differential Revision: https://reviews.llvm.org/D90993
I ran into this pattern when converting elementwise ops like
`addf %arg0, %arg : tensor<?xf32>` to linalg. Redundant arguments can
also easily arise from linalg-fusion-for-tensor-ops.
Also, fix some small bugs in the logic in
LinalgStructuredOpsInterface.td.
Differential Revision: https://reviews.llvm.org/D90812
The PyOpOperands container was erroneously constructing objects for
individual operands as PyOpResult. Operands in fact are just values,
which may or may not be results of another operation. The code would
eventually crash if the operand was a block argument. Add a test that
exercises the behavior that previously led to crashes.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D90917
We were discussing on discord regarding the need for extension-based systems like Python to dynamically link against MLIR (or else you can only have one extension that depends on it). Currently, when I set that up, I piggy-backed off of the flag that enables build libLLVM.so and libMLIR.so and depended on libMLIR.so from the python extension if shared library building was enabled. However, this is less than ideal.
In the current setup, libMLIR.so exports both all symbols from the C++ API and the C-API. The former is a kitchen sink and the latter is curated. We should be splitting them and for things that are properly factored to depend on the C-API, they should have the option to *only* depend on the C-API, and we should build that shared library no matter what. Its presence isn't just an optimization: it is a key part of the system.
To do this right, I needed to:
* Introduce visibility macros into mlir-c/Support.h. These should work on both *nix and windows as-is.
* Create a new libMLIRPublicAPI.so with just the mlir-c object files.
* Compile the C-API with -fvisibility=hidden.
* Conditionally depend on the libMLIR.so from libMLIRPublicAPI.so if building libMLIR.so (otherwise, also links against the static libs and will produce a mondo libMLIRPublicAPI.so).
* Disable re-exporting of static library symbols that come in as transitive deps.
This gives us a dynamic linked C-API layer that is minimal and should work as-is on all platforms. Since we don't support libMLIR.so building on Windows yet (and it is not very DLL friendly), this will fall back to a mondo build of libMLIRPublicAPI.so, which has its uses (it is also the most size conscious way to go if you happen to know exactly what you need).
Sizes (release/stripped, Ubuntu 20.04):
Shared library build:
libMLIRPublicAPI.so: 121Kb
_mlir.cpython-38-x86_64-linux-gnu.so: 1.4Mb
mlir-capi-ir-test: 135Kb
libMLIR.so: 21Mb
Static build:
libMLIRPublicAPI.so: 5.5Mb (since this is a "static" build, this includes the MLIR implementation as non-exported code).
_mlir.cpython-38-x86_64-linux-gnu.so: 1.4Mb
mlir-capi-ir-test: 44Kb
Things like npcomp and circt which bring their own dialects/transforms/etc would still need the shared library build and code that links against libMLIR.so (since it is all C++ interop stuff), but hopefully things that only depend on the public C-API can just have the one narrow dep.
I spot checked everything with nm, and it looks good in terms of what is exporting/importing from each layer.
I'm not in a hurry to land this, but if it is controversial, I'll probably split off the Support.h and API visibility macro changes, since we should set that pattern regardless.
Reviewed By: mehdi_amini, benvanik
Differential Revision: https://reviews.llvm.org/D90824
There exists a generic folding facility that folds the operand of a memref_cast
into users of memref_cast that support this. However, it was not used for the
memref_cast itself. Fix it to enable elimination of memref_cast chains such as
%1 = memref_cast %0 : A to B
%2 = memref_cast %1 : B to A
that is achieved by combining the folding with the existing "A to A" cast
elimination.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D90910
The CMake macro refactoring had a hardcoded value left instead of using
the function argument.
Didn't catch it locally before because it required a clean build to
trigger.
This target will depend on each individual extension and represent "all"
Python bindings in the repo. User projects can get a finer grain control by
depending directly on some individual targets as needed.
The Python bindings now require -DLLVM_BUILD_LLVM_DYLIB=ON to build.
This change is needed to be able to build multiple Python native
extension without having each of them embedding a copy of MLIR, which
would make them incompatible with each other. Instead they should all
link to the same copy of MLIR.
Differential Revision: https://reviews.llvm.org/D90813
This functionality is superceded by BufferResultsToOutParams pass (see
https://reviews.llvm.org/D90071) for users the require buffers to be
out-params. That pass should be run immediately after all tensors are gone from
the program (before buffer optimizations and deallocation insertion), such as
immediately after a "finalizing" bufferize pass.
The -test-finalizing-bufferize pass now defaults to what used to be the
`allowMemrefFunctionResults=true` flag. and the
finalizing-bufferize-allowed-memref-results.mlir file is moved
to test/Transforms/finalizing-bufferize.mlir.
Differential Revision: https://reviews.llvm.org/D90778
TestDialect has many operations and they all live in ::mlir namespace.
Sometimes it is not clear whether the ops used in the code for the test passes
belong to Standard or to Test dialects.
Also, with this change it is easier to understand what test passes registered
in mlir-opt are actually passes in mlir/test.
Differential Revision: https://reviews.llvm.org/D90794
The test file is a long list of functions, followed by equally long FileCheck
comments inside "main". Distribute FileCheck comments closer to the functions
that produce the output we are checking.
Reviewed By: mehdi_amini, stellaraccident
Differential Revision: https://reviews.llvm.org/D90743
The LinalgDependenceGraph and alias analysis provide the necessary analysis for the Linalg fusion on buffers case.
However this is not enough for linalg on tensors which require proper memory effects to play nicely with DCE and other transformations.
This revision adds side effects to Linalg ops that were previously missing and has 2 consequences:
1. one example in the copy removal pass now fails since the linalg.generic op has side effects and the pass does not perform alias analysis / distinguish between reads and writes.
2. a few examples in fusion-tensor.mlir need to return the resulting tensor otherwise DCE automatically kicks in as part of greedy pattern application.
Differential Revision: https://reviews.llvm.org/D90762
VectorExtractDynamicOp in SPIRV dialect
conversion from vector.extractelement to spirv VectorExtractDynamicOp
Differential Revision: https://reviews.llvm.org/D90679
Per spec, vector sizes 8 and 16 are allowed when Vector16 capability is present.
This change expands the limitation of vector sizes to accept these sizes.
Differential Revision: https://reviews.llvm.org/D90683
The previous behavior was fragile when building an OpPassManager using a
string, as it was forcing the client to ensure the string to outlive the
entire PassManager.
This isn't a performance sensitive area either that would justify
optimizing further.
- The ODS description was using an old syntax that was updated during the review.
This fixes the ODS description to match the current syntax.
Differential Revision: https://reviews.llvm.org/D90797
- Eliminate duplicated information about mapping from memref -> its descriptor fields
by consolidating that mapping in two functions: getMemRefDescriptorFields and
getUnrankedMemRefDescriptorFields.
- Change convertMemRefType() and convertUnrankedMemRefType() to use these
functions.
- Remove convertMemrefSignature and convertUnrankedMemrefSignature.
Differential Revision: https://reviews.llvm.org/D90707
Previously, linalg-bufferize was a "finalizing" bufferization pass (it
did a "full" conversion). This wasn't great because it couldn't be used
composably with other bufferization passes like std-bufferize and
scf-bufferize.
This patch makes linalg-bufferize a composable bufferization pass.
Notice that the integration tests are switched over to using a pipeline
of std-bufferize, linalg-bufferize, and (to finalize the conversion)
func-bufferize. It all "just works" together.
While doing this transition, I ran into a nasty bug in the 1-use special
case logic for forwarding init tensors. That logic, while
well-intentioned, was fundamentally flawed, because it assumed that if
the original tensor value had one use, then the converted memref could
be mutated in place. That assumption is wrong in many cases. For
example:
```
%0 = some_tensor : tensor<4xf32>
br ^bb0(%0, %0: tensor<4xf32>, tensor<4xf32>)
^bb0(%bbarg0: tensor<4xf32>, %bbarg1: tensor<4xf32>)
// %bbarg0 is an alias of %bbarg1. We cannot safely write
// to it without analyzing uses of %bbarg1.
linalg.generic ... init(%bbarg0) {...}
```
A similar example can happen in many scenarios with function arguments.
Even more sinister, if the converted memref is produced by a
`std.get_global_memref` of a constant global memref, then we might
attempt to write into read-only statically allocated storage! Not all
memrefs are writable!
Clearly, this 1-use check is not a local transformation that we can do
on the fly in this pattern, so I removed it.
The test is now drastically shorter and I basically rewrote the CHECK
lines from scratch because:
- the new composable linalg-bufferize just doesn't do as much, so there
is less to test
- a lot of the tests were related to the 1-use check, which is now gone,
so there is less to test
- the `-buffer-hoisting -buffer-deallocation` is no longer mixed in, so
the checks related to that had to be rewritten
Differential Revision: https://reviews.llvm.org/D90657
When the "after" region of a WhileOp is merely forwarding its arguments back to
the "before" region, i.e. WhileOp is a canonical do-while loop, a simpler CFG
subgraph that omits the "after" region with its extra branch operation can be
produced. Loop rotation from general "while" to "if { do-while }" is left for a
future canonicalization pattern when it becomes necessary.
Differential Revision: https://reviews.llvm.org/D90604
The lowering is a straightforward inlining of the "before" and "after" regions
connected by (conditional) branches. This plugs the WhileOp into the
progressive lowering scheme. Future commits may choose to target WhileOp
instead of CFG when lowering ForOp.
Differential Revision: https://reviews.llvm.org/D90603
The new construct represents a generic loop with two regions: one executed
before the loop condition is verifier and another after that. This construct
can be used to express both a "while" loop and a "do-while" loop, depending on
where the main payload is located. It is intended as an intermediate
abstraction for lowering, which will be added later. This form is relatively
easy to target from higher-level abstractions and supports transformations such
as loop rotation and LICM.
Differential Revision: https://reviews.llvm.org/D90255
* All functions that return an Operation now return an OpView.
* All functions that accept an Operation now accept an _OperationBase, which both Operation and OpView extend and can resolve to the backing Operation.
* Moves user-facing instance methods from Operation -> _OperationBase so that both can have the same API.
* Concretely, this means that if there are custom op classes defined (i.e. in Python), any iteration or creation will return the appropriate instance (i.e. if you get/create an std.addf, you will get an instance of the mlir.dialects.std.AddFOp class, getting full access to any custom API it exposes).
* Refactors all __eq__ methods after realizing the proper way to do this for _OperationBase.
Differential Revision: https://reviews.llvm.org/D90584
This is useful in C source files where it is easy for a typo to be
silently assumed by the compiler to be an implicit declaration.
Differential Revision: https://reviews.llvm.org/D90727
This delegate the control of the buffering to the user of the API. This
seems like a safer option as messages are immediately propagated to the
user, which may lead to less surprising behavior during debugging for
instance.
In terms of performance, a user can add a buffered stream on the other
side of the callback.
Differential Revision: https://reviews.llvm.org/D90726
This is exposing the basic functionalities (create, nest, addPass, run) of
the PassManager through the C API in the new header: `include/mlir-c/Pass.h`.
In order to exercise it in the unit-test, a basic TableGen backend is
also provided to generate a simple C wrapper around the pass
constructor. It is used to expose the libTransforms passes to the C API.
Reviewed By: stellaraccident, ftynse
Differential Revision: https://reviews.llvm.org/D90667
- Verify that attributes parsed using a custom parser do not have duplicates.
- If there are duplicated in the attribute dictionary in the input, they get caught during the
dictionary parsing.
- This check verifies that there is no duplication between the parsed dictionary and any
attributes that might be added by the custom parser (or when the custom parsing code
adds duplicate attributes).
- Fixes https://bugs.llvm.org/show_bug.cgi?id=48025
Differential Revision: https://reviews.llvm.org/D90502
Previously, they were only defined for `FuncOp`.
To support this, `FunctionLike` needs a way to get an updated type
from the concrete operation. This adds a new hook for that purpose,
called `getTypeWithoutArgsAndResults`.
For now, `FunctionLike` continues to assume the type is
`FunctionType`, and concrete operations that use another type can hide
the `getType`, `setType`, and `getTypeWithoutArgsAndResults` methods.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90363
The OpenMP dialect include is only needed for translation
and is not required in LLVM dialect.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D90510
* Use function_ref instead of std::function in several methods
* Use ::get instead of ::getChecked for IntegerType.
- It is already fully verified and constructing a mlir::Location can be extremely costly during parsing.
BufferizeTests.
Summary:
Added test operations to replace the LinalgDialect dependency in tests
which use the buffer-deallocation, buffer-hoisting,
buffer-loop-hoisting, promote-buffers-to-stack,
buffer-placement-preparation-allowed-memref-resutls and
buffer-placement-preparation pass. Adapted the corresponding tests cases
and TestBufferPlacement.cpp.
Differential Revision: https://reviews.llvm.org/D90037
This is an error prone behavior, I frequently have ~20 min debugging sessions when I hit
an unexpected implicit nesting. This default makes the C++ API safer for users.
Depends On D90669
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90671
This simplifies a few parts of the pass manager, but in particular we don't add as many
verifierpass as there are passes in the pipeline, and we can now enable/disable the
verifier after the fact on an already built PassManager.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90669
This header was an initial early attempt at a crude C API for bindings,
but it isn't used and redundant with the new API. At this point it only
contributes to more confusion.
Differential Revision: https://reviews.llvm.org/D90643
Because cstr operations allow more instruction reordering than asserts, we only
lower cstr_broadcastable to std ops with cstr_require. This ensures that the
more drastic lowering to asserts can happen specifically with the user's desire.
Differential Revision: https://reviews.llvm.org/D89325
This patch renames AffineParallelNormalize to AffineLoopNormalize to make it
more generic and be able to hold more loop normalization transformations in
the future for affine.for and affine.parallel ops. Eventually, it could also be
extended to support scf.for and scf.parallel. As a starting point for affine.for,
the patch also adds support for removing single iteration affine.for ops to the
the pass.
Differential Revision: https://reviews.llvm.org/D90267
This revision refactors the base Op/AbstractOperation classes to reduce the amount of generated code size when defining a new operation. The current scheme involves taking the address of functions defined directly on Op and Trait classes. This is problematic because even when these functions are empty/unused we still result in these functions being defined in the main executable. In this revision, we switch to using SFINAE and template type filtering to remove remove functions that are not needed/used. For example, if an operation does not define a custom `print` method we shouldn't define a templated `printAssembly` method for it. The same applies to parsing/folding/verification/etc. This dropped MLIR code size for a large downstream library by ~10%(~1 mb in an opt build).
Differential Revision: https://reviews.llvm.org/D90196
Looks like we have a blind spot in the testing matrix.
AsyncRegionRewriter.cpp: In member function ‘virtual void {anonymous}::GpuAsyncRegionPass::runOnFunction()’:
AsyncRegionRewriter.cpp:113:16: internal compiler error: in replace_placeholders_r, at cp/tree.c:2804
if (getFunction()
~~~~~~~~~~~~~
.getRegion()
~~~~~~~~~~~~
.walk(Callback{OpBuilder{&getContext()}})
~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Add standard dialect operations to define global variables with memref types and to
retrieve the memref for to a named global variable
- Extend unit tests to test verification for these operations.
Differential Revision: https://reviews.llvm.org/D90337
BufferPlacement is no longer part of bufferization. However, this test
is an important test of "finalizing" bufferize passes.
A "finalizing" bufferize conversion is one that performs a "full"
conversion and expects all tensors to be gone from the program. This in
particular involves rewriting funcs (including block arguments of the
contained region), calls, and returns. The unique property of finalizing
bufferization passes is that they cannot be done via a local
transformation with suitable materializations to ensure composability
(as other bufferization passes do). For example, if a call is
rewritten, the callee needs to be rewritten otherwise the IR will end up
invalid. Thus, finalizing bufferization passes require an atomic change
to the entire program (e.g. the whole module).
This new designation makes it clear also that it shouldn't be testing
bufferization of linalg ops, so the tests have been updated to not use
linalg.generic ops. (linalg.copy is still used as the "copy" op for
copying into out-params)
Differential Revision: https://reviews.llvm.org/D89979
This is the most basic possible finalizing bufferization pass, which I
also think is sufficient for most new use cases. The more concentrated
nature of this pass also greatly clarifies the invariants that it
requires on its input to safely transform the program (see the
pass description in Passes.td).
With this pass, I have now upstreamed practically all of the
bufferizations from npcomp (the exception being std.constant, which can
be upstreamed when std.global_memref lands:
https://llvm.discourse.group/t/rfc-global-variables-in-mlir/2076/16 )
Differential Revision: https://reviews.llvm.org/D90205
Leaking macros isn't a good practice when defining headers. This
requires to duplicate the macro definition in every header though, but
that seems like a better tradeoff right now.
Differential Revision: https://reviews.llvm.org/D90633
* Finishes support for Context, InsertionPoint and Location to be carried by the thread using context managers.
* Introduces type casters and utilities so that DefaultPyMlirContext and DefaultPyLocation in method signatures does the right thing (allows explicit or gets from the thread context).
* Extend the rules for the thread context stack to handle nesting, appropriately inheriting and clearing depending on whether the context is the same.
* Refactors all method signatures to follow the new convention on trailing parameters for defaulting parameters (loc, ip, context). When the objects are carried in the thread context, this allows most explicit uses of these values to be elided.
* Removes the style guide section on putting accessors to construct global objects on the PyMlirContext: this style fails to make good use of the new facility since it is often the only thing remaining needing an MlirContext.
* Moves Module parse/creation from mlir.ir.Context to static methods on mlir.ir.Module.
* Moves Context.create_operation to a static Operation.create method.
* Moves Type parsing from mlir.ir.Context to static methods on mlir.ir.Type.
* Moves Attribute parsing from mlir.ir.Context to static methods on mlir.ir.Attribute.
* Move Location factory methods from mlir.ir.Context to static methods on mlir.ir.Location.
* Refactors the std dialect fake "ODS" generated code to take advantage of the new scheme.
Differential Revision: https://reviews.llvm.org/D90547
This pass allows removing getResultConversionKind from
BufferizeTypeConverter. This pass replaces the AppendToArgumentsList
functionality. As far as I could tell, the only use of this functionlity
is to perform the transformation that is implemented in this pass.
Future patches will remove the getResultConversionKind machinery from
BufferizeTypeConverter, but sending this patch for individual review for
clarity.
Differential Revision: https://reviews.llvm.org/D90071
This commit adds a new library that merges/combines a number of spv
modules into a combined one. The library has a single entry point:
combine(...).
To combine a number of MLIR spv modules, we move all the module-level ops
from all the input modules into one big combined module. To that end, the
combination process can proceed in 2 phases:
(1) resolving conflicts between pairs of ops from different modules
(2) deduplicate equivalent ops/sub-ops in the merged module. (TODO)
This patch implements only the first phase.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90477
The bufferization patterns are moved to the .cpp file, which is
preferred in the codebase when it makes sense.
The LinalgToStandard patterns are kept a header because they are
expected to be used individually. However, they are moved to
LinalgToStandard.h which is the file corresponding to where they are
defined.
This also removes TensorCastOpConverter, which is handled by
populateStdBufferizePatterns now. Eventually, the constant op lowering
will be handled as well, but it there are currently holdups on moving
it (see https://reviews.llvm.org/D89916).
Differential Revision: https://reviews.llvm.org/D90254
This commit adds a new library that merges/combines a number of spv
modules into a combined one. The library has a single entry point:
combine(...).
To combine a number of MLIR spv modules, we move all the module-level ops
from all the input modules into one big combined module. To that end, the
combination process can proceed in 2 phases:
(1) resolving conflicts between pairs of ops from different modules
(2) deduplicate equivalent ops/sub-ops in the merged module. (TODO)
This patch implements only the first phase.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90477
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
This commit adds a new library that merges/combines a number of spv
modules into a combined one. The library has a single entry point:
combine(...).
To combine a number of MLIR spv modules, we move all the module-level ops
from all the input modules into one big combined module. To that end, the
combination process can proceed in 2 phases:
(1) resolving conflicts between pairs of ops from different modules
(2) deduplicate equivalent ops/sub-ops in the merged module. (TODO)
This patch implements only the first phase.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90022
This op returns a boolean value indicating whether 2 ops are
broadcastable or not. This follows the same logic as the other ops with
broadcast in their names in the shape dialect.
Concretely, shape.is_broadcastable returning true implies that
shape.broadcast will not give an error, and shape.cstr_broadcastable
will not result in an assertion failure. Similarly, false implies an
error or assertion failure.
Previously they were separated into "instance" and "kind" aliases, and also required that the dialect know ahead of time all of the instances that would have a corresponding alias. This approach was very clunky and not ergonomic to interact with. The new approach is to provide the dialect with an instance of an attribute/type to provide an alias for, fully replacing the original split approach.
Differential Revision: https://reviews.llvm.org/D89354
* Removes index based insertion. All insertion now happens through the insertion point.
* Introduces thread local context managers for implicit creation relative to an insertion point.
* Introduces (but does not yet use) binding the Context to the thread local context stack. Intent is to refactor all methods to take context optionally and have them use the default if available.
* Adds C APIs for mlirOperationGetParentOperation(), mlirOperationGetBlock() and mlirBlockGetTerminator().
* Removes an assert in PyOperation creation that was incorrectly constraining. There is already a TODO to rework the keepAlive field that it was guarding and without the assert, it is no worse than the current state.
Differential Revision: https://reviews.llvm.org/D90368
Fix semantic in the distribute integration test based on offline feedback. This
exposed a bug in block distribution, we need to make sure the id is multiplied
by the stride of the vector. Fix the transformation and unit test.
Differential Revision: https://reviews.llvm.org/D89291
For the synchronous case, destroy the stream after synchronization.
Sneak in a unrelated change to report why the gpu.wait conversion pattern didn't match.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89933
This is a roll-forward of rGec7780ebdab4, now that the remaining
gpu.launch_func have been converted to custom form in rGb22f111023ba.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90420
This should fix the reason for the failures after ec7780ebda. I will roll forward in a separate change.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90410
Linalg "tile-and-fuse" is currently exposed as a Linalg pass "-linalg-fusion" but only the mechanics of the transformation are currently relevant.
Instead turn it into a "-test-linalg-greedy-fusion" pass which performs canonicalizations to enable more fusions to compose.
This allows dropping the OperationFolder which is not meant to be used with the pattern rewrite infrastructure.
Differential Revision: https://reviews.llvm.org/D90394
Update op is modelling the update directive (2.14.4) from the OpenACC specs.
An if condition and a device_type list can be attached to the directive. This patch add
these two information to the current op.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90310
The previous ordering continued to use the original assuming after
replacing it which is not allowed. Now, inline the region from the old
into the new before the replacement.
Differential Revision: https://reviews.llvm.org/D90375
* Check region count for unknown symbol tables first, as it is a faster check
* Add an accessor to MutableDictionaryAttr to get the internal dictionary without creating a new one if it is empty. This avoids an otherwise unnecessary lookup of an MLIRContext.
Often times the legality of inlining can change depending on if the callable is going to be inlined in-place, or cloned. For example, some operations are not allowed to be duplicated and can only be inlined if the original callable will cease to exist afterwards. The new `wouldBeCloned` flag allows for dialects to hook into this when determining legality.
Differential Revision: https://reviews.llvm.org/D90360
In certain situations it isn't legal to inline a call operation, but this isn't something that is possible(at least not easily) to prevent with the current hooks. This revision adds a new hook so that dialects with call operations that shouldn't be inlined can prevent it.
Differential Revision: https://reviews.llvm.org/D90359
This patch fixes a bug [[ https://bugs.llvm.org/show_bug.cgi?id=46091 | 46091 ]]
Raw data for the `dense-element attribute` is written in little endian (LE) format.
This commit converts the format to big endian (BE) in ʻAttribute Parser` on the
BE machine. Also, when outputting on a BE machine, the BE format is converted
to LE in "AsmPrinter".
Differential Revision: https://reviews.llvm.org/D80695
This revision optimizes the parsing of hex strings by using the checked variant of llvm::fromHex, and adding a specialized method to Token for extracting hex strings. This leads a large decrease in compile time when parsing large hex constants (one example: 2.6 seconds -> 370 miliseconds)
Differential Revision: https://reviews.llvm.org/D90266
This fixes a subtle issue, described in the comment starting with
"Clone the op without the regions and inline the regions from the old op",
which prevented this conversion from working on non-trivial examples.
Differential Revision: https://reviews.llvm.org/D90203
Getting the body of a Module is a common need which justifies a
dedicated accessor instead of forcing users to go through the
region->blocks->front unwrapping manually.
Differential Revision: https://reviews.llvm.org/D90287
This commit changes to use plain values instead of references.
We need to copy it anyway. References forbid using temporary
values generated from expressions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D90277
At this point, these methods are just carbon copies of OpBuilder::create and aren't necessary given that PatternRewriter inherits from OpBuilder.
Differential Revision: https://reviews.llvm.org/D90087
An InterfaceMap is generated for every single operation type, and is responsible for a large amount of the code size from MLIR given that its internals highly utilize templates. This revision refactors the internal implementation to use bare malloc/free for interface instances as opposed to static variables and moves as much code out of templates as possible. This led to a decrease of over >1mb (~12% of total MLIR related code size) for a downstream MLIR library with a large amount of operations.
Differential Revision: https://reviews.llvm.org/D90086
When compiling for code size, the use of a vtable causes a destructor(and constructor in certain cases) to be generated for the class. Interface models don't need a complex constructor or a destructor, so this can lead to many megabytes of code size increase(even in opt). This revision switches to a simpler struct of function pointers approach that accomplishes the same API requirements as before. This change requires no updates to user code, or any other code aside from the generator, as the user facing API is still exactly the same.
Differential Revision: https://reviews.llvm.org/D90085
All InterfaceMethods will have a corresponding entry in the interface model, and by extension have an implementation generated for every operation type. This can result in large binary size increases when a large amount of operations use an interface, such as the side effect interface.
Differential Revision: https://reviews.llvm.org/D90084
This patch adds support for fusing linalg.indexed_generic op with
linalg.tensor_reshape op by expansion, i.e.
- linalg.indexed_generic op -> linalg.tensor_reshape op when the
latter is expanding.
- linalg.tensor_reshape op -> linalg.indexed_generic op when the
former is folding.
Differential Revision: https://reviews.llvm.org/D90082
These logically belong together since it's a base commit plus
followup fixes to less common build configurations.
The patches are:
Revert "CfgInterface: rename interface() to getInterface()"
This reverts commit a74fc48158.
Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5"
This reverts commit f2a06875b6.
Revert "Try to make GCC5 happy about the CfgTraits thing"
This reverts commit 03a5f7ce12.
Revert "Introduce CfgTraits abstraction"
This reverts commit c0cdd22c72.