Since SPIR-V module has an optional name, this patch
makes a change to pass it to `ModuleOp` during conversion.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D90904
The tests are intended to exercise the public C API and will link to a
specific shared library exposing only the C API, this library itself may
link to libMLIR.so.
If we link some LLVM library statically in the test themselves, we end
up with duplicated cl::opt registrations in LLVM. A possible setup if
these libraries were needed could be to link libMLIR.so directly when
available and link statically when it isn't available (in which case the
libary exposing the C API would be statically link and isolated from the
cl::opt registry, hopefully).
Differential Revision: https://reviews.llvm.org/D90993
I ran into this pattern when converting elementwise ops like
`addf %arg0, %arg : tensor<?xf32>` to linalg. Redundant arguments can
also easily arise from linalg-fusion-for-tensor-ops.
Also, fix some small bugs in the logic in
LinalgStructuredOpsInterface.td.
Differential Revision: https://reviews.llvm.org/D90812
The PyOpOperands container was erroneously constructing objects for
individual operands as PyOpResult. Operands in fact are just values,
which may or may not be results of another operation. The code would
eventually crash if the operand was a block argument. Add a test that
exercises the behavior that previously led to crashes.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D90917
We were discussing on discord regarding the need for extension-based systems like Python to dynamically link against MLIR (or else you can only have one extension that depends on it). Currently, when I set that up, I piggy-backed off of the flag that enables build libLLVM.so and libMLIR.so and depended on libMLIR.so from the python extension if shared library building was enabled. However, this is less than ideal.
In the current setup, libMLIR.so exports both all symbols from the C++ API and the C-API. The former is a kitchen sink and the latter is curated. We should be splitting them and for things that are properly factored to depend on the C-API, they should have the option to *only* depend on the C-API, and we should build that shared library no matter what. Its presence isn't just an optimization: it is a key part of the system.
To do this right, I needed to:
* Introduce visibility macros into mlir-c/Support.h. These should work on both *nix and windows as-is.
* Create a new libMLIRPublicAPI.so with just the mlir-c object files.
* Compile the C-API with -fvisibility=hidden.
* Conditionally depend on the libMLIR.so from libMLIRPublicAPI.so if building libMLIR.so (otherwise, also links against the static libs and will produce a mondo libMLIRPublicAPI.so).
* Disable re-exporting of static library symbols that come in as transitive deps.
This gives us a dynamic linked C-API layer that is minimal and should work as-is on all platforms. Since we don't support libMLIR.so building on Windows yet (and it is not very DLL friendly), this will fall back to a mondo build of libMLIRPublicAPI.so, which has its uses (it is also the most size conscious way to go if you happen to know exactly what you need).
Sizes (release/stripped, Ubuntu 20.04):
Shared library build:
libMLIRPublicAPI.so: 121Kb
_mlir.cpython-38-x86_64-linux-gnu.so: 1.4Mb
mlir-capi-ir-test: 135Kb
libMLIR.so: 21Mb
Static build:
libMLIRPublicAPI.so: 5.5Mb (since this is a "static" build, this includes the MLIR implementation as non-exported code).
_mlir.cpython-38-x86_64-linux-gnu.so: 1.4Mb
mlir-capi-ir-test: 44Kb
Things like npcomp and circt which bring their own dialects/transforms/etc would still need the shared library build and code that links against libMLIR.so (since it is all C++ interop stuff), but hopefully things that only depend on the public C-API can just have the one narrow dep.
I spot checked everything with nm, and it looks good in terms of what is exporting/importing from each layer.
I'm not in a hurry to land this, but if it is controversial, I'll probably split off the Support.h and API visibility macro changes, since we should set that pattern regardless.
Reviewed By: mehdi_amini, benvanik
Differential Revision: https://reviews.llvm.org/D90824
There exists a generic folding facility that folds the operand of a memref_cast
into users of memref_cast that support this. However, it was not used for the
memref_cast itself. Fix it to enable elimination of memref_cast chains such as
%1 = memref_cast %0 : A to B
%2 = memref_cast %1 : B to A
that is achieved by combining the folding with the existing "A to A" cast
elimination.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D90910
The CMake macro refactoring had a hardcoded value left instead of using
the function argument.
Didn't catch it locally before because it required a clean build to
trigger.
This target will depend on each individual extension and represent "all"
Python bindings in the repo. User projects can get a finer grain control by
depending directly on some individual targets as needed.
The Python bindings now require -DLLVM_BUILD_LLVM_DYLIB=ON to build.
This change is needed to be able to build multiple Python native
extension without having each of them embedding a copy of MLIR, which
would make them incompatible with each other. Instead they should all
link to the same copy of MLIR.
Differential Revision: https://reviews.llvm.org/D90813
This functionality is superceded by BufferResultsToOutParams pass (see
https://reviews.llvm.org/D90071) for users the require buffers to be
out-params. That pass should be run immediately after all tensors are gone from
the program (before buffer optimizations and deallocation insertion), such as
immediately after a "finalizing" bufferize pass.
The -test-finalizing-bufferize pass now defaults to what used to be the
`allowMemrefFunctionResults=true` flag. and the
finalizing-bufferize-allowed-memref-results.mlir file is moved
to test/Transforms/finalizing-bufferize.mlir.
Differential Revision: https://reviews.llvm.org/D90778
TestDialect has many operations and they all live in ::mlir namespace.
Sometimes it is not clear whether the ops used in the code for the test passes
belong to Standard or to Test dialects.
Also, with this change it is easier to understand what test passes registered
in mlir-opt are actually passes in mlir/test.
Differential Revision: https://reviews.llvm.org/D90794
The test file is a long list of functions, followed by equally long FileCheck
comments inside "main". Distribute FileCheck comments closer to the functions
that produce the output we are checking.
Reviewed By: mehdi_amini, stellaraccident
Differential Revision: https://reviews.llvm.org/D90743
The LinalgDependenceGraph and alias analysis provide the necessary analysis for the Linalg fusion on buffers case.
However this is not enough for linalg on tensors which require proper memory effects to play nicely with DCE and other transformations.
This revision adds side effects to Linalg ops that were previously missing and has 2 consequences:
1. one example in the copy removal pass now fails since the linalg.generic op has side effects and the pass does not perform alias analysis / distinguish between reads and writes.
2. a few examples in fusion-tensor.mlir need to return the resulting tensor otherwise DCE automatically kicks in as part of greedy pattern application.
Differential Revision: https://reviews.llvm.org/D90762
VectorExtractDynamicOp in SPIRV dialect
conversion from vector.extractelement to spirv VectorExtractDynamicOp
Differential Revision: https://reviews.llvm.org/D90679
Per spec, vector sizes 8 and 16 are allowed when Vector16 capability is present.
This change expands the limitation of vector sizes to accept these sizes.
Differential Revision: https://reviews.llvm.org/D90683
The previous behavior was fragile when building an OpPassManager using a
string, as it was forcing the client to ensure the string to outlive the
entire PassManager.
This isn't a performance sensitive area either that would justify
optimizing further.
- The ODS description was using an old syntax that was updated during the review.
This fixes the ODS description to match the current syntax.
Differential Revision: https://reviews.llvm.org/D90797
- Eliminate duplicated information about mapping from memref -> its descriptor fields
by consolidating that mapping in two functions: getMemRefDescriptorFields and
getUnrankedMemRefDescriptorFields.
- Change convertMemRefType() and convertUnrankedMemRefType() to use these
functions.
- Remove convertMemrefSignature and convertUnrankedMemrefSignature.
Differential Revision: https://reviews.llvm.org/D90707
Previously, linalg-bufferize was a "finalizing" bufferization pass (it
did a "full" conversion). This wasn't great because it couldn't be used
composably with other bufferization passes like std-bufferize and
scf-bufferize.
This patch makes linalg-bufferize a composable bufferization pass.
Notice that the integration tests are switched over to using a pipeline
of std-bufferize, linalg-bufferize, and (to finalize the conversion)
func-bufferize. It all "just works" together.
While doing this transition, I ran into a nasty bug in the 1-use special
case logic for forwarding init tensors. That logic, while
well-intentioned, was fundamentally flawed, because it assumed that if
the original tensor value had one use, then the converted memref could
be mutated in place. That assumption is wrong in many cases. For
example:
```
%0 = some_tensor : tensor<4xf32>
br ^bb0(%0, %0: tensor<4xf32>, tensor<4xf32>)
^bb0(%bbarg0: tensor<4xf32>, %bbarg1: tensor<4xf32>)
// %bbarg0 is an alias of %bbarg1. We cannot safely write
// to it without analyzing uses of %bbarg1.
linalg.generic ... init(%bbarg0) {...}
```
A similar example can happen in many scenarios with function arguments.
Even more sinister, if the converted memref is produced by a
`std.get_global_memref` of a constant global memref, then we might
attempt to write into read-only statically allocated storage! Not all
memrefs are writable!
Clearly, this 1-use check is not a local transformation that we can do
on the fly in this pattern, so I removed it.
The test is now drastically shorter and I basically rewrote the CHECK
lines from scratch because:
- the new composable linalg-bufferize just doesn't do as much, so there
is less to test
- a lot of the tests were related to the 1-use check, which is now gone,
so there is less to test
- the `-buffer-hoisting -buffer-deallocation` is no longer mixed in, so
the checks related to that had to be rewritten
Differential Revision: https://reviews.llvm.org/D90657
When the "after" region of a WhileOp is merely forwarding its arguments back to
the "before" region, i.e. WhileOp is a canonical do-while loop, a simpler CFG
subgraph that omits the "after" region with its extra branch operation can be
produced. Loop rotation from general "while" to "if { do-while }" is left for a
future canonicalization pattern when it becomes necessary.
Differential Revision: https://reviews.llvm.org/D90604
The lowering is a straightforward inlining of the "before" and "after" regions
connected by (conditional) branches. This plugs the WhileOp into the
progressive lowering scheme. Future commits may choose to target WhileOp
instead of CFG when lowering ForOp.
Differential Revision: https://reviews.llvm.org/D90603
The new construct represents a generic loop with two regions: one executed
before the loop condition is verifier and another after that. This construct
can be used to express both a "while" loop and a "do-while" loop, depending on
where the main payload is located. It is intended as an intermediate
abstraction for lowering, which will be added later. This form is relatively
easy to target from higher-level abstractions and supports transformations such
as loop rotation and LICM.
Differential Revision: https://reviews.llvm.org/D90255
* All functions that return an Operation now return an OpView.
* All functions that accept an Operation now accept an _OperationBase, which both Operation and OpView extend and can resolve to the backing Operation.
* Moves user-facing instance methods from Operation -> _OperationBase so that both can have the same API.
* Concretely, this means that if there are custom op classes defined (i.e. in Python), any iteration or creation will return the appropriate instance (i.e. if you get/create an std.addf, you will get an instance of the mlir.dialects.std.AddFOp class, getting full access to any custom API it exposes).
* Refactors all __eq__ methods after realizing the proper way to do this for _OperationBase.
Differential Revision: https://reviews.llvm.org/D90584
This is useful in C source files where it is easy for a typo to be
silently assumed by the compiler to be an implicit declaration.
Differential Revision: https://reviews.llvm.org/D90727
This delegate the control of the buffering to the user of the API. This
seems like a safer option as messages are immediately propagated to the
user, which may lead to less surprising behavior during debugging for
instance.
In terms of performance, a user can add a buffered stream on the other
side of the callback.
Differential Revision: https://reviews.llvm.org/D90726
This is exposing the basic functionalities (create, nest, addPass, run) of
the PassManager through the C API in the new header: `include/mlir-c/Pass.h`.
In order to exercise it in the unit-test, a basic TableGen backend is
also provided to generate a simple C wrapper around the pass
constructor. It is used to expose the libTransforms passes to the C API.
Reviewed By: stellaraccident, ftynse
Differential Revision: https://reviews.llvm.org/D90667
- Verify that attributes parsed using a custom parser do not have duplicates.
- If there are duplicated in the attribute dictionary in the input, they get caught during the
dictionary parsing.
- This check verifies that there is no duplication between the parsed dictionary and any
attributes that might be added by the custom parser (or when the custom parsing code
adds duplicate attributes).
- Fixes https://bugs.llvm.org/show_bug.cgi?id=48025
Differential Revision: https://reviews.llvm.org/D90502
Previously, they were only defined for `FuncOp`.
To support this, `FunctionLike` needs a way to get an updated type
from the concrete operation. This adds a new hook for that purpose,
called `getTypeWithoutArgsAndResults`.
For now, `FunctionLike` continues to assume the type is
`FunctionType`, and concrete operations that use another type can hide
the `getType`, `setType`, and `getTypeWithoutArgsAndResults` methods.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90363
The OpenMP dialect include is only needed for translation
and is not required in LLVM dialect.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D90510
* Use function_ref instead of std::function in several methods
* Use ::get instead of ::getChecked for IntegerType.
- It is already fully verified and constructing a mlir::Location can be extremely costly during parsing.
BufferizeTests.
Summary:
Added test operations to replace the LinalgDialect dependency in tests
which use the buffer-deallocation, buffer-hoisting,
buffer-loop-hoisting, promote-buffers-to-stack,
buffer-placement-preparation-allowed-memref-resutls and
buffer-placement-preparation pass. Adapted the corresponding tests cases
and TestBufferPlacement.cpp.
Differential Revision: https://reviews.llvm.org/D90037
This is an error prone behavior, I frequently have ~20 min debugging sessions when I hit
an unexpected implicit nesting. This default makes the C++ API safer for users.
Depends On D90669
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90671
This simplifies a few parts of the pass manager, but in particular we don't add as many
verifierpass as there are passes in the pipeline, and we can now enable/disable the
verifier after the fact on an already built PassManager.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90669
This header was an initial early attempt at a crude C API for bindings,
but it isn't used and redundant with the new API. At this point it only
contributes to more confusion.
Differential Revision: https://reviews.llvm.org/D90643
Because cstr operations allow more instruction reordering than asserts, we only
lower cstr_broadcastable to std ops with cstr_require. This ensures that the
more drastic lowering to asserts can happen specifically with the user's desire.
Differential Revision: https://reviews.llvm.org/D89325
This patch renames AffineParallelNormalize to AffineLoopNormalize to make it
more generic and be able to hold more loop normalization transformations in
the future for affine.for and affine.parallel ops. Eventually, it could also be
extended to support scf.for and scf.parallel. As a starting point for affine.for,
the patch also adds support for removing single iteration affine.for ops to the
the pass.
Differential Revision: https://reviews.llvm.org/D90267
This revision refactors the base Op/AbstractOperation classes to reduce the amount of generated code size when defining a new operation. The current scheme involves taking the address of functions defined directly on Op and Trait classes. This is problematic because even when these functions are empty/unused we still result in these functions being defined in the main executable. In this revision, we switch to using SFINAE and template type filtering to remove remove functions that are not needed/used. For example, if an operation does not define a custom `print` method we shouldn't define a templated `printAssembly` method for it. The same applies to parsing/folding/verification/etc. This dropped MLIR code size for a large downstream library by ~10%(~1 mb in an opt build).
Differential Revision: https://reviews.llvm.org/D90196
Looks like we have a blind spot in the testing matrix.
AsyncRegionRewriter.cpp: In member function ‘virtual void {anonymous}::GpuAsyncRegionPass::runOnFunction()’:
AsyncRegionRewriter.cpp:113:16: internal compiler error: in replace_placeholders_r, at cp/tree.c:2804
if (getFunction()
~~~~~~~~~~~~~
.getRegion()
~~~~~~~~~~~~
.walk(Callback{OpBuilder{&getContext()}})
~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Add standard dialect operations to define global variables with memref types and to
retrieve the memref for to a named global variable
- Extend unit tests to test verification for these operations.
Differential Revision: https://reviews.llvm.org/D90337
BufferPlacement is no longer part of bufferization. However, this test
is an important test of "finalizing" bufferize passes.
A "finalizing" bufferize conversion is one that performs a "full"
conversion and expects all tensors to be gone from the program. This in
particular involves rewriting funcs (including block arguments of the
contained region), calls, and returns. The unique property of finalizing
bufferization passes is that they cannot be done via a local
transformation with suitable materializations to ensure composability
(as other bufferization passes do). For example, if a call is
rewritten, the callee needs to be rewritten otherwise the IR will end up
invalid. Thus, finalizing bufferization passes require an atomic change
to the entire program (e.g. the whole module).
This new designation makes it clear also that it shouldn't be testing
bufferization of linalg ops, so the tests have been updated to not use
linalg.generic ops. (linalg.copy is still used as the "copy" op for
copying into out-params)
Differential Revision: https://reviews.llvm.org/D89979
This is the most basic possible finalizing bufferization pass, which I
also think is sufficient for most new use cases. The more concentrated
nature of this pass also greatly clarifies the invariants that it
requires on its input to safely transform the program (see the
pass description in Passes.td).
With this pass, I have now upstreamed practically all of the
bufferizations from npcomp (the exception being std.constant, which can
be upstreamed when std.global_memref lands:
https://llvm.discourse.group/t/rfc-global-variables-in-mlir/2076/16 )
Differential Revision: https://reviews.llvm.org/D90205
Leaking macros isn't a good practice when defining headers. This
requires to duplicate the macro definition in every header though, but
that seems like a better tradeoff right now.
Differential Revision: https://reviews.llvm.org/D90633
* Finishes support for Context, InsertionPoint and Location to be carried by the thread using context managers.
* Introduces type casters and utilities so that DefaultPyMlirContext and DefaultPyLocation in method signatures does the right thing (allows explicit or gets from the thread context).
* Extend the rules for the thread context stack to handle nesting, appropriately inheriting and clearing depending on whether the context is the same.
* Refactors all method signatures to follow the new convention on trailing parameters for defaulting parameters (loc, ip, context). When the objects are carried in the thread context, this allows most explicit uses of these values to be elided.
* Removes the style guide section on putting accessors to construct global objects on the PyMlirContext: this style fails to make good use of the new facility since it is often the only thing remaining needing an MlirContext.
* Moves Module parse/creation from mlir.ir.Context to static methods on mlir.ir.Module.
* Moves Context.create_operation to a static Operation.create method.
* Moves Type parsing from mlir.ir.Context to static methods on mlir.ir.Type.
* Moves Attribute parsing from mlir.ir.Context to static methods on mlir.ir.Attribute.
* Move Location factory methods from mlir.ir.Context to static methods on mlir.ir.Location.
* Refactors the std dialect fake "ODS" generated code to take advantage of the new scheme.
Differential Revision: https://reviews.llvm.org/D90547
This pass allows removing getResultConversionKind from
BufferizeTypeConverter. This pass replaces the AppendToArgumentsList
functionality. As far as I could tell, the only use of this functionlity
is to perform the transformation that is implemented in this pass.
Future patches will remove the getResultConversionKind machinery from
BufferizeTypeConverter, but sending this patch for individual review for
clarity.
Differential Revision: https://reviews.llvm.org/D90071
This commit adds a new library that merges/combines a number of spv
modules into a combined one. The library has a single entry point:
combine(...).
To combine a number of MLIR spv modules, we move all the module-level ops
from all the input modules into one big combined module. To that end, the
combination process can proceed in 2 phases:
(1) resolving conflicts between pairs of ops from different modules
(2) deduplicate equivalent ops/sub-ops in the merged module. (TODO)
This patch implements only the first phase.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90477
The bufferization patterns are moved to the .cpp file, which is
preferred in the codebase when it makes sense.
The LinalgToStandard patterns are kept a header because they are
expected to be used individually. However, they are moved to
LinalgToStandard.h which is the file corresponding to where they are
defined.
This also removes TensorCastOpConverter, which is handled by
populateStdBufferizePatterns now. Eventually, the constant op lowering
will be handled as well, but it there are currently holdups on moving
it (see https://reviews.llvm.org/D89916).
Differential Revision: https://reviews.llvm.org/D90254
This commit adds a new library that merges/combines a number of spv
modules into a combined one. The library has a single entry point:
combine(...).
To combine a number of MLIR spv modules, we move all the module-level ops
from all the input modules into one big combined module. To that end, the
combination process can proceed in 2 phases:
(1) resolving conflicts between pairs of ops from different modules
(2) deduplicate equivalent ops/sub-ops in the merged module. (TODO)
This patch implements only the first phase.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90477
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
This commit adds a new library that merges/combines a number of spv
modules into a combined one. The library has a single entry point:
combine(...).
To combine a number of MLIR spv modules, we move all the module-level ops
from all the input modules into one big combined module. To that end, the
combination process can proceed in 2 phases:
(1) resolving conflicts between pairs of ops from different modules
(2) deduplicate equivalent ops/sub-ops in the merged module. (TODO)
This patch implements only the first phase.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90022
This op returns a boolean value indicating whether 2 ops are
broadcastable or not. This follows the same logic as the other ops with
broadcast in their names in the shape dialect.
Concretely, shape.is_broadcastable returning true implies that
shape.broadcast will not give an error, and shape.cstr_broadcastable
will not result in an assertion failure. Similarly, false implies an
error or assertion failure.
Previously they were separated into "instance" and "kind" aliases, and also required that the dialect know ahead of time all of the instances that would have a corresponding alias. This approach was very clunky and not ergonomic to interact with. The new approach is to provide the dialect with an instance of an attribute/type to provide an alias for, fully replacing the original split approach.
Differential Revision: https://reviews.llvm.org/D89354
* Removes index based insertion. All insertion now happens through the insertion point.
* Introduces thread local context managers for implicit creation relative to an insertion point.
* Introduces (but does not yet use) binding the Context to the thread local context stack. Intent is to refactor all methods to take context optionally and have them use the default if available.
* Adds C APIs for mlirOperationGetParentOperation(), mlirOperationGetBlock() and mlirBlockGetTerminator().
* Removes an assert in PyOperation creation that was incorrectly constraining. There is already a TODO to rework the keepAlive field that it was guarding and without the assert, it is no worse than the current state.
Differential Revision: https://reviews.llvm.org/D90368
Fix semantic in the distribute integration test based on offline feedback. This
exposed a bug in block distribution, we need to make sure the id is multiplied
by the stride of the vector. Fix the transformation and unit test.
Differential Revision: https://reviews.llvm.org/D89291
For the synchronous case, destroy the stream after synchronization.
Sneak in a unrelated change to report why the gpu.wait conversion pattern didn't match.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89933
This is a roll-forward of rGec7780ebdab4, now that the remaining
gpu.launch_func have been converted to custom form in rGb22f111023ba.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90420
This should fix the reason for the failures after ec7780ebda. I will roll forward in a separate change.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D90410
Linalg "tile-and-fuse" is currently exposed as a Linalg pass "-linalg-fusion" but only the mechanics of the transformation are currently relevant.
Instead turn it into a "-test-linalg-greedy-fusion" pass which performs canonicalizations to enable more fusions to compose.
This allows dropping the OperationFolder which is not meant to be used with the pattern rewrite infrastructure.
Differential Revision: https://reviews.llvm.org/D90394
Update op is modelling the update directive (2.14.4) from the OpenACC specs.
An if condition and a device_type list can be attached to the directive. This patch add
these two information to the current op.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90310
The previous ordering continued to use the original assuming after
replacing it which is not allowed. Now, inline the region from the old
into the new before the replacement.
Differential Revision: https://reviews.llvm.org/D90375
* Check region count for unknown symbol tables first, as it is a faster check
* Add an accessor to MutableDictionaryAttr to get the internal dictionary without creating a new one if it is empty. This avoids an otherwise unnecessary lookup of an MLIRContext.
Often times the legality of inlining can change depending on if the callable is going to be inlined in-place, or cloned. For example, some operations are not allowed to be duplicated and can only be inlined if the original callable will cease to exist afterwards. The new `wouldBeCloned` flag allows for dialects to hook into this when determining legality.
Differential Revision: https://reviews.llvm.org/D90360
In certain situations it isn't legal to inline a call operation, but this isn't something that is possible(at least not easily) to prevent with the current hooks. This revision adds a new hook so that dialects with call operations that shouldn't be inlined can prevent it.
Differential Revision: https://reviews.llvm.org/D90359
This patch fixes a bug [[ https://bugs.llvm.org/show_bug.cgi?id=46091 | 46091 ]]
Raw data for the `dense-element attribute` is written in little endian (LE) format.
This commit converts the format to big endian (BE) in ʻAttribute Parser` on the
BE machine. Also, when outputting on a BE machine, the BE format is converted
to LE in "AsmPrinter".
Differential Revision: https://reviews.llvm.org/D80695
This revision optimizes the parsing of hex strings by using the checked variant of llvm::fromHex, and adding a specialized method to Token for extracting hex strings. This leads a large decrease in compile time when parsing large hex constants (one example: 2.6 seconds -> 370 miliseconds)
Differential Revision: https://reviews.llvm.org/D90266
This fixes a subtle issue, described in the comment starting with
"Clone the op without the regions and inline the regions from the old op",
which prevented this conversion from working on non-trivial examples.
Differential Revision: https://reviews.llvm.org/D90203
Getting the body of a Module is a common need which justifies a
dedicated accessor instead of forcing users to go through the
region->blocks->front unwrapping manually.
Differential Revision: https://reviews.llvm.org/D90287
This commit changes to use plain values instead of references.
We need to copy it anyway. References forbid using temporary
values generated from expressions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D90277
At this point, these methods are just carbon copies of OpBuilder::create and aren't necessary given that PatternRewriter inherits from OpBuilder.
Differential Revision: https://reviews.llvm.org/D90087
An InterfaceMap is generated for every single operation type, and is responsible for a large amount of the code size from MLIR given that its internals highly utilize templates. This revision refactors the internal implementation to use bare malloc/free for interface instances as opposed to static variables and moves as much code out of templates as possible. This led to a decrease of over >1mb (~12% of total MLIR related code size) for a downstream MLIR library with a large amount of operations.
Differential Revision: https://reviews.llvm.org/D90086
When compiling for code size, the use of a vtable causes a destructor(and constructor in certain cases) to be generated for the class. Interface models don't need a complex constructor or a destructor, so this can lead to many megabytes of code size increase(even in opt). This revision switches to a simpler struct of function pointers approach that accomplishes the same API requirements as before. This change requires no updates to user code, or any other code aside from the generator, as the user facing API is still exactly the same.
Differential Revision: https://reviews.llvm.org/D90085
All InterfaceMethods will have a corresponding entry in the interface model, and by extension have an implementation generated for every operation type. This can result in large binary size increases when a large amount of operations use an interface, such as the side effect interface.
Differential Revision: https://reviews.llvm.org/D90084
This patch adds support for fusing linalg.indexed_generic op with
linalg.tensor_reshape op by expansion, i.e.
- linalg.indexed_generic op -> linalg.tensor_reshape op when the
latter is expanding.
- linalg.tensor_reshape op -> linalg.indexed_generic op when the
former is folding.
Differential Revision: https://reviews.llvm.org/D90082
These logically belong together since it's a base commit plus
followup fixes to less common build configurations.
The patches are:
Revert "CfgInterface: rename interface() to getInterface()"
This reverts commit a74fc48158.
Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5"
This reverts commit f2a06875b6.
Revert "Try to make GCC5 happy about the CfgTraits thing"
This reverts commit 03a5f7ce12.
Revert "Introduce CfgTraits abstraction"
This reverts commit c0cdd22c72.
* Still rough edges that need more sugar but the bones are there. Notes left in the test case for things that can be improved.
* Does not actually yield custom OpViews yet for traversing. Will rework that in a followup.
Differential Revision: https://reviews.llvm.org/D89932
A recent commit introduced a new syntax for specifying builder arguments in
ODS, which is better amenable to automated processing, and deprecated the old
form. Transition all dialects as well as Linalg ODS generator to use the new
syntax.
Add a deprecation notice to ODS generator.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D90038
The TypeID instance was moved in D89153.
It wasn't caught that it broke MLIR pretty printers because pre-merge checks don't run check-debuginfo.
Avoid disabling all MLIR printers in case this happens again by catching the exception.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D90191
Using an Identifier is much more efficient for attribute lookups because it uses pointer comparison as opposed to string comparison.
Differential Revision: https://reviews.llvm.org/D89660
This revisions implements sharding in the storage of parametric instances to decrease lock contention by sharding out the allocator/mutex/etc. to use for a specific storage instance based on the hash key. This is a somewhat common approach to reducing lock contention on data structures, and is used by the concurrent hashmaps provided by folly/java/etc. For several compilations tested, this removed all/most lock contention from profiles and reduced compile time by several seconds.
Differential Revision: https://reviews.llvm.org/D89659
This class represents a rewrite pattern list that has been frozen, and thus immutable. This replaces the uses of OwningRewritePatternList in pattern driver related API, such as dialect conversion. When PDL becomes more prevalent, this API will allow for optimizing a set of patterns once without the need to do this per run of a pass.
Differential Revision: https://reviews.llvm.org/D89104
There are several pieces of pattern rewriting infra in IR/ that really shouldn't be there. This revision moves those pieces to a better location such that they are easier to evolve in the future(e.g. with PDL). More concretely this revision does the following:
* Create a Transforms/GreedyPatternRewriteDriver.h and move the apply*andFold methods there.
The definitions for these methods are already in Transforms/ so it doesn't make sense for the declarations to be in IR.
* Create a new lib/Rewrite library and move PatternApplicator there.
This new library will be focused on applying rewrites, and will also include compiling rewrites with PDL.
Differential Revision: https://reviews.llvm.org/D89103
The Pattern class was originally intended to be used for solely matching operations, but that use never materialized. All of the pattern infrastructure uses RewritePattern, and the infrastructure for pure matching(Matchers.h) is implemented inline. This means that this class isn't a useful abstraction at the moment, so this revision refactors it to solely encapsulate the "metadata" of a pattern. The metadata includes the various state describing a pattern; benefit, root operation, etc. The API on PatternApplicator is updated to now operate on `Pattern`s as nothing special from `RewritePattern` is necessary.
This refactoring is also necessary for the upcoming use of PDL patterns alongside C++ rewrite patterns.
Differential Revision: https://reviews.llvm.org/D86258
The conversion between PDL and the interpreter is split into several different parts.
** The Matcher:
The matching section of all incoming pdl.pattern operations is converted into a predicate tree and merged. Each pattern is first converted into an ordered list of predicates starting from the root operation. A predicate is composed of three distinct parts:
* Position
- A position refers to a specific location on the input DAG, i.e. an
existing MLIR entity being matched. These can be attributes, operands,
operations, results, and types. Each position also defines a relation to
its parent. For example, the operand `[0] -> 1` has a parent operation
position `[0]` (the root).
* Question
- A question refers to a query on a specific positional value. For
example, an operation name question checks the name of an operation
position.
* Answer
- An answer is the expected result of a question. For example, when
matching an operation with the name "foo.op". The question would be an
operation name question, with an expected answer of "foo.op".
After the predicate lists have been created and ordered(based on occurrence of common predicates and other factors), they are formed into a tree of nodes that represent the branching flow of a pattern match. This structure allows for efficient construction and merging of the input patterns. There are currently only 4 simple nodes in the tree:
* ExitNode: Represents the termination of a match
* SuccessNode: Represents a successful match of a specific pattern
* BoolNode/SwitchNode: Branch to a specific child node based on the expected answer to a predicate question.
Once the matcher tree has been generated, this tree is walked to generate the corresponding interpreter operations.
** The Rewriter:
The rewriter portion of a pattern is generated in a very straightforward manor, similarly to lowerings in other dialects. Each PDL operation that may exist within a rewrite has a mapping into the interpreter dialect. The code for the rewriter is generated within a FuncOp, that is invoked by the interpreter on a successful pattern match. Referenced values defined in the matcher become inputs the generated rewriter function.
An example lowering is shown below:
```mlir
// The following high level PDL pattern:
pdl.pattern : benefit(1) {
%resultType = pdl.type
%inputOperand = pdl.input
%root, %results = pdl.operation "foo.op"(%inputOperand) -> %resultType
pdl.rewrite %root {
pdl.replace %root with (%inputOperand)
}
}
// is lowered to the following:
module {
// The matcher function takes the root operation as an input.
func @matcher(%arg0: !pdl.operation) {
pdl_interp.check_operation_name of %arg0 is "foo.op" -> ^bb2, ^bb1
^bb1:
pdl_interp.return
^bb2:
pdl_interp.check_operand_count of %arg0 is 1 -> ^bb3, ^bb1
^bb3:
pdl_interp.check_result_count of %arg0 is 1 -> ^bb4, ^bb1
^bb4:
%0 = pdl_interp.get_operand 0 of %arg0
pdl_interp.is_not_null %0 : !pdl.value -> ^bb5, ^bb1
^bb5:
%1 = pdl_interp.get_result 0 of %arg0
pdl_interp.is_not_null %1 : !pdl.value -> ^bb6, ^bb1
^bb6:
// This operation corresponds to a successful pattern match.
pdl_interp.record_match @rewriters::@rewriter(%0, %arg0 : !pdl.value, !pdl.operation) : benefit(1), loc([%arg0]), root("foo.op") -> ^bb1
}
module @rewriters {
// The inputs to the rewriter from the matcher are passed as arguments.
func @rewriter(%arg0: !pdl.value, %arg1: !pdl.operation) {
pdl_interp.replace %arg1 with(%arg0)
pdl_interp.return
}
}
}
```
Differential Revision: https://reviews.llvm.org/D84580
Adds support for
- Dropping unit dimension loops for indexed_generic ops.
- Folding consecutive folding (or expanding) reshapes when the result
(or src) is a scalar.
- Fixes to indexed_generic -> generic fusion when zero-dim tensors are
involved.
Differential Revision: https://reviews.llvm.org/D90118
Substitues `Type` by `Attribute` in the declaration of AttributeInterface. It
looks like the code was written by copy-pasting the definition of TypeInterface,
but the substitution of Type by Attribute was missing at some places.
Reviewed By: rriddle, ftynse
Differential Revision: https://reviews.llvm.org/D90138
The alignment attribute in the 'alloca' op treats the '0' value as 'unset'.
When parsing the custom form of the 'alloca' op, ignore the alignment attribute
with if its value is '0' instead of actually creating it and producing a
slightly different textually yet equivalent semantically form in the output.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D90179
Based on discourse discussion, fix the doc string and remove examples with
wrong semantic. Also fix insert_map semantic by adding missing operand for
vector we are inserting into.
Differential Revision: https://reviews.llvm.org/D89563
This revision allows the fusion of the producer of input tensors in the consumer under a tiling transformation (which produces subtensors).
Many pieces are still missing (e.g. support init_tensors, better refactor LinalgStructuredOp interface support, try to merge implementations and reuse code) but this still allows getting started.
The greedy pass itself is just for testing purposes and will be extracted in a separate test pass.
Differential revision: https://reviews.llvm.org/D89491
This patch introduces a SPIR-V runner. The aim is to run a gpu
kernel on a CPU via GPU -> SPIRV -> LLVM conversions. This is a first
prototype, so more features will be added in due time.
- Overview
The runner follows similar flow as the other runners in-tree. However,
having converted the kernel to SPIR-V, we encode the bind attributes of
global variables that represent kernel arguments. Then SPIR-V module is
converted to LLVM. On the host side, we emulate passing the data to device
by creating in main module globals with the same symbolic name as in kernel
module. These global variables are later linked with ones from the nested
module. We copy data from kernel arguments to globals, call the kernel
function from nested module and then copy the data back.
- Current state
At the moment, the runner is capable of running 2 modules, nested one in
another. The kernel module must contain exactly one kernel function. Also,
the runner supports rank 1 integer memref types as arguments (to be scaled).
- Enhancement of JitRunner and ExecutionEngine
To translate nested modules to LLVM IR, JitRunner and ExecutionEngine were
altered to take an optional (default to `nullptr`) function reference that
is a custom LLVM IR module builder. This allows to customize LLVM IR module
creation from MLIR modules.
Reviewed By: ftynse, mravishankar
Differential Revision: https://reviews.llvm.org/D86108
This patch introduces a pass for running
`mlir-spirv-cpu-runner` - LowerHostCodeToLLVMPass.
This pass emulates `gpu.launch_func` call in LLVM dialect and lowers
the host module code to LLVM. It removes the `gpu.module`, creates a
sequence of global variables that are later linked to the varables
in the kernel module, as well as a series of copies to/from
them to emulate the memory transfer to/from the host or to/from the
device sides. It also converts the remaining Standard dialect into
LLVM dialect, emitting C wrappers.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86112
This dependency was already existing indirectly, but is now more direct
since the registration relies on a inline function. This fixes the
link of the tools with BFD.
The current pattern for vector unrolling takes the native shape to
unroll to at pattern instantiation time, but the native shape might
defer based on the types of the operand. Introduce a
UnrollVectorOptions struct which allows for using a function that will
return the native shape based on the operation. Move other options of
unrolling like `filterConstraints` into this struct.
Differential Revision: https://reviews.llvm.org/D89744
Add folder for the case where ExtractStridedSliceOp source comes from a chain
of InsertStridedSliceOp. Also add a folder for the trivial case where the
ExtractStridedSliceOp is a no-op.
Differential Revision: https://reviews.llvm.org/D89850
This patch provides C API for MLIR affine expression.
- Implement C API for methods of AffineExpr class.
- Implement C API for methods of derived classes (AffineBinaryOpExpr, AffineDimExpr, AffineSymbolExpr, and AffineConstantExpr).
Differential Revision: https://reviews.llvm.org/D89856
Added optimization pass to convert heap-based allocs to stack-based allocas in
buffer placement. Added the corresponding test file.
Differential Revision: https://reviews.llvm.org/D89688
Before this change, we would run `maxIterations` if the first iteration changed the op.
After this change, we exit the loop as soon as an iteration hasn't changed the op.
Assuming that we have reached a fixed point when an iteration doesn't change the op, this doesn't affect correctness.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D89981
This reverts commit 4986d5eaff with
proper patches to CMakeLists.txt:
- Add MLIRAsync as a dependency to MLIRAsyncToLLVM
- Add Coroutines as a dependency to MLIRExecutionEngine
Lower from Async dialect to LLVM by converting async regions attached to `async.execute` operations into LLVM coroutines (https://llvm.org/docs/Coroutines.html):
1. Outline all async regions to functions
2. Add LLVM coro intrinsics to mark coroutine begin/end
3. Use MLIR conversion framework to convert all remaining async types and ops to LLVM + Async runtime function calls
All `async.await` operations inside async regions converted to coroutine suspension points. Await operation outside of a coroutine converted to the blocking wait operations.
Implement simple runtime to support concurrent execution of coroutines.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89292
Forward missing attributes when creating the new transfer op otherwise the
builder would use default values.
Differential Revision: https://reviews.llvm.org/D89907
* Adds a new MlirOpPrintingFlags type and supporting accessors.
* Adds a new mlirOperationPrintWithFlags function.
* Adds a full featured python Operation.print method with all options and the ability to print directly to files/stdout in text or binary.
* Adds an Operation.get_asm which delegates to print and returns a str or bytes.
* Reworks Operation.__str__ to be based on get_asm.
Differential Revision: https://reviews.llvm.org/D89848
A "structural" type conversion is one where the underlying ops are
completely agnostic to the actual types involved and simply need to update
their types. An example of this is shape.assuming -- the shape.assuming op
and the corresponding shape.assuming_yield op need to update their types
accordingly to the TypeConverter, but otherwise don't care what type
conversions are happening.
Also, the previous conversion code would not correctly materialize
conversions for the shape.assuming_yield op. This should have caused a
verification failure, but shape.assuming's verifier wasn't calling
RegionBranchOpInterface::verifyTypes (which for reasons can't be called
automatically as part of the trait verification, and requires being
called manually). This patch also adds that verification.
Differential Revision: https://reviews.llvm.org/D89833
A "structural" type conversion is one where the underlying ops are
completely agnostic to the actual types involved and simply need to update
their types. An example of this is scf.if -- the scf.if op and the
corresponding scf.yield ops need to update their types accordingly to the
TypeConverter, but otherwise don't care what type conversions are happening.
To test the structural type conversions, it is convenient to define a
bufferize pass for a dialect, which exercises them nicely.
Differential Revision: https://reviews.llvm.org/D89757
The documentation claims that an op with the trait FunctionLike has a
single region containing the blocks that corresponding to the body of
the function. It then goes on to say that the absence of a region
corresponds to an external function when, in fact, this is represented
by a single empty region. This patch changes the wording in the
documentation to match the implementation.
Signed-off-by: Frej Drejhammar <frej.drejhammar@gmail.com>
Co-authored-by: Frej Drejhammar <frej.drejhammar@gmail.com>
Co-authored-by: Klas Segeljakt <klasseg@kth.se>
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D89868
Historically, custom builder specification in OpBuilder has been accepting the
formal parameter list for the builder method as a raw string containing C++.
While this worked well to connect the signature and the body, this became
problematic when ODS needs to manipulate the parameter list, e.g. to inject
OpBuilder or to trim default values when generating the definition. This has
also become inconsistent with other method declarations, in particular in
interface definitions.
Introduce the possibility to define OpBuilder formal parameters using a
TableGen dag similarly to other methods. Additionally, introduce a mechanism to
declare parameters with default values using an additional class. This
mechanism can be reused in other methods. The string-based builder signature
declaration is deprecated and will be removed after a transition period.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D89470
Docstrings for `__str__` method in many classes was recycling the constant
string defined for `Type`, without being types themselves. Use proper
docstrings instead. Since they are succint, use string literals instead of
top-level constants to avoid further mistakes.
Differential Revision: https://reviews.llvm.org/D89780
The pybind class typedef for concrete attribute classes was erroneously
deriving all of them from PyAttribute instead of the provided base class. This
has not been triggering any error because only one level of the hierarchy is
currently exposed.
Differential Revision: https://reviews.llvm.org/D89779
Values are ubiquitous in the IR, in particular block argument and operation
results are Values. Define Python classes for BlockArgument, OpResult and their
common ancestor Value. Define pseudo-container classes for lists of block
arguments and operation results, and use these containers to access the
corresponding values in blocks and operations.
Differential Revision: https://reviews.llvm.org/D89778
The CfgTraits abstraction simplfies writing algorithms that are
generic over the type of CFG, and enables writing such algorithms
as regular non-template code that operates on opaque references
to CFG blocks and values.
Implementations of CfgTraits provide operations on the concrete
CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock *`.
CfgInterface is an abstract base class which provides operations
on opaque types CfgBlockRef and CfgValueRef. Those opaque types
encapsulate a `void *`, but the meaning depends on the concrete
CFG type. For example, MachineCfgTraits -- for use with MachineIR
in SSA form -- encodes a Register inside CfgValueRef. Converting
between concrete references and opaque/generic ones is done by
CfgTraits::{fromGeneric,toGeneric}. Convenience methods
CfgTraits::{un}wrap{Iterator,Range} are available as well.
Writing algorithms in terms of CfgInterface adds some overhead
(virtual method calls, plus in same cases it removes the
opportunity to inline iterators), but can be much more convenient
since generic algorithms can be written as non-templates.
This patch adds implementations of CfgTraits for all CFGs on
which dominator trees are calculated, so that the dominator
tree can be ported to this machinery. Only IrCfgTraits (LLVM IR)
and MachineCfgTraits (Machine IR in SSA form) are complete, the
other implementations are limited to the absolute minimum
required to make the upcoming dominator tree changes work.
v5:
- fix MachineCfgTraits::blockdef_iterator and allow it to iterate over
the instructions in a bundle
- use MachineBasicBlock::printName
v6:
- implement predecessors/successors for all CfgTraits implementations
- fix error in unwrapRange
- rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming
that is consistent with {wrap,unwrap}{Iterator,Range}
- use getVRegDef instead of getUniqueVRegDef
v7:
- std::forward fix in wrapping_iterator
- fix typos
v8:
- cleanup operators on CfgOpaqueType
- address other review comments
Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d
Differential Revision: https://reviews.llvm.org/D83088
This still satisfies the constraints required by the affine dialect and
gives more flexibility in what iteration bounds can be used when
loewring to the GPU dialect.
Differential Revision: https://reviews.llvm.org/D89782