Now that dialect constructors are generated in the .cpp file, we can
drop all of the dependent dialect includes from the .h file.
Differential Revision: https://reviews.llvm.org/D124298
As a fallback mechanism, if no entry was supplied for a given address space, the size or alignment for a pointer type with the default address space is returned instead.
This code currently crashes with opaque pointers, as it tries to construct a typed pointer type from the opaque pointer type, leading to a null pointer dereference when fetching the element type.
This patch fixes the issue by handling the opaque pointer cases explicitly.
Differential Revision: https://reviews.llvm.org/D124290
Using opaque pointers in function signatures leads to an attempt to recursively convert all types, including sub types in LLVM types. In the case of LLVM pointers, it may not have a subtype aka element type if it is opaque which would then lead to a null pointer dereference.
Differential Revision: https://reviews.llvm.org/D124291
This change fixes `CollapsedLayoutMap` for cases where the collapsed
dims are size 1. The cases where inner most dims are size 1 and
noncontiguous can be represented by the strided form and therefore can
be allowed. For such cases, the new stride should be of the next entry
in an association whose dimension is not size 1. If the next entry is
dynamic, it's not possible to decide which stride to use at compilation
time and the stride is set to dynamic.
Differential Revision: https://reviews.llvm.org/D124137
Currently, the sequence of Transform dialect operations only supports a single
use of each operand (verified by the `transform.sequence` operation). This was
originally motivated by the need to guard against accessing a payload IR
operation associated with a transform IR value after this operation has likely
been rewritten by a transformation. However, not all Transform dialect
operations rewrite payload IR, in particular the "navigation" operation such as
`transform.pdl_match` do not.
Introduce memory effects to the Transform dialect operations to describe their
effect on the payload IR and the mapping between payload IR opreations and
transform IR values. Use these effects to replace the single-use rule, allowing
repeated reads and disallowing use-after-free, where operations with the "free"
effect are considered to "consume" the transform IR value and rewrite the
corresponding payload IR operations). As an additional improvement, this
enables code motion transformation on the transform IR itself.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D124181
The bubble up logic was written by assuming the slice operation is
always a normal slice that outputs a tensor with the same rank.
Differential Revision: https://reviews.llvm.org/D124283
This allows printing the users of an operation as proposed in the git issue #53286.
To be able to refer to operations with no result, these operations are assigned an
ID in SSANameState.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D124048
Add shape func op for use (primarily) in shape function_library op. Allows
setting default dialect for some simpler authoring. This is a minimal version
of the ops needed.
Differential Revision: https://reviews.llvm.org/D124055
If there is only one single element in the vector, then we can
just extract the element to compute the final result.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D124129
This makes the API easier to use. Also allows us to check for incorrect API usage for easier debugging.
Differential Revision: https://reviews.llvm.org/D124265
vector.broadcast can inject all size one dimensions. If it's
followed by a vector.shape_cast to the original type, we can
cancel the op pair, like cancelling consecutive shape_cast ops.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D124094
* Move Module Bufferization to the bufferization dialect. The implementation is split into `OneShotModuleBufferize.cpp` and `FuncBufferizableOpInterfaceImpl.cpp`, so that the external model implementation can be easily moved to the func dialect in the future.
* Split and clean up test cases. A few test cases are still remaining in Linalg and will be updated separately.
* `linalg.inplaceable` is renamed to `bufferization.writable` to accurately reflect its current usage.
* Attributes and their verifiers are moved from the Linalg dialect to the Bufferization dialect.
* Expand documentation.
* Add a new flag to One-Shot Bufferize to allow for function boundary bufferization.
Differential Revision: https://reviews.llvm.org/D122229
The layout postprocessing step was removed and is now part of the FuncOp bufferization. If the user specified a certain layout map for a tensor function arg, use that layout map directly when bufferizing the function signature. Previously, the bufferization used a generic layout map for every tensor function arg and then updated function signatures and CallOps in a separate step.
Differential Revision: https://reviews.llvm.org/D122228
FuncOps are now less special. They must still be analyzed + bufferized in a certain order, but they are now bufferized same as other ops that have a region: Bufferize the op first (`bufferize` interface method), then bufferize the region body with other bufferization patterns. In the case of FuncOps, the function signature is bufferized together with ReturnOps. Similar to how, e.g., scf.for ops are bufferized together with scf.yield ops.
This change is essentially a reimplementation of the FuncOp bufferization, but mostly NFC from a user's perspective (apart from error messages). This change is in preparation of moving the code to the bufferization dialect.
Differential Revision: https://reviews.llvm.org/D123214
The bufferization driver was previously using a GreedyPatternRewriter. This was problematic because bufferization must traverse ops top-to-bottom. The GreedyPatternRewriter was previously configured via `useTopDownTraversal`, but this was a hack; this API was just meant for performance improvements and should not affect the result of the rewrite.
BEGIN_PUBLIC
No public commit message needed.
END_PUBLIC
Differential Revision: https://reviews.llvm.org/D123618
This patch replaces current fold function with the common constant fold funtion in order to cover the situation of constant splat.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D124236
This patch replaces some code with matchPattern and move them before the constant folder function in order to avoid redundant invoking.
Differential Revision: https://reviews.llvm.org/D124235
It seems more natural than to have it as a static method of ExpandShapeOp.
Also fix a typo ("the the" -> "the").
Differential Revision: https://reviews.llvm.org/D124234
Insert the select op before the combiner op when vectorizing a
reduction loop that needs a mask, so the vectorized reduction loop
can pass isLoopParallel check and be transformed correctly in later
passes.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D124047
When Location tracking support for block arguments was added, we
discussed various approaches to threading support for this through
function-like argument parsing. At the time, we added a parallel array
of locations that could hold this. It turns out that that approach was
verbose and error prone, roughly no one adopted it.
This patch takes a different approach, adding an optional source
locator to the UnresolvedOperand class. This fits much more naturally
into the standard structure we use for representing locators, and gives
all the function like dialects locator support for free (e.g. see the
test adding an example for the LLVM dialect).
Differential Revision: https://reviews.llvm.org/D124188
Previously, checking that a fix point is reached was counted as a full
iteration. As this "iteration" never changes the IR, this seems counter-
intuitive.
Differential Revision: https://reviews.llvm.org/D123641
This introduces a pair of ops to the Transform dialect that connect it to PDL
patterns. Transform dialect relies on PDL for matching the Payload IR ops that
are about to be transformed. For this purpose, it provides a container op for
patterns, a "pdl_match" op and transform interface implementations that call
into the pattern matching infrastructure.
To enable the caching of compiled patterns, this also provides the extension
mechanism for TransformState. Extensions allow one to store additional
information in the TransformState and thus communicate it between different
Transform dialect operations when they are applied. They can be added and
removed when applying transform ops. An extension containing a symbol table in
which the pattern names are resolved and a pattern compilation cache is
introduced as the first client.
Depends On D123664
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D124007
Prior to this patch, `cloneInto` would do a simple walk over the blocks and contained operations and clone and map them as it encounters them. As finishing touch it then remaps any successor and operands it has remapped during that process.
This is generally fine, but sadly leads to a lot of uses of both operations and blocks from the source region, in the cloned operations in the target region. Those uses lead to writes in the use-def list of the operations, making `cloneInto` never thread safe.
This patch reimplements `cloneInto` in three steps to avoid ever creating any extra uses on elements in the source region:
* It first creates the mapping of all blocks and block operands
* It then clones all operations to create the mapping of all operation results, but does not yet clone any regions or set the operands
* After all operation results have been mapped, it now sets the operations operands and clones their regions.
That way it is now possible to call `cloneInto` from multiple threads if the Region or Operation is isolated-from-above. This allows creating copies of functions or to use `mlir::inlineCall` with the same source region from multiple threads. In the general case, the method is thread-safe if through cloning, no new uses of `Value`s from outside the cloned Operation/Region are created. This can be ensured by mapping any outside operands via the `BlockAndValueMapping` to `Value`s owned by the caller thread.
While I was at it, I also reworked the `clone` method of `Operation` a little bit and added a proper options class to avoid having a `cloneWithoutRegionsAndOperands` method, and be more extensible in the future. `cloneWithoutRegions` is now also a simple wrapper that calls `clone` with the proper options set. That way all the operation cloning code is now contained solely within `clone`.
Differential Revision: https://reviews.llvm.org/D123917
Add async dependencies support for gpu.launch op: this allows specifying
a list of async tokens ("streams") as dependencies for the launch.
Update the GPU kernel outlining pass lowering to propagate async
dependencies from gpu.launch to gpu.launch_func op. Previously, a new
stream was being created and destroyed for a kernel launch. The async
deps support allows the kernel launch to be serialized on an existing
stream.
Differential Revision: https://reviews.llvm.org/D123499
This patch adds lowering support for atomic read and write constructs.
Also added is pointer modelling code to allow FIR pointer like types to
be inferred and converted while lowering.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D122725
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
This patch handles empty hint value for critical and atomic constructs.
This also adds checks and tests for hint clause on atomic constructs.
Reviewed By: peixin, kiranchandramohan, NimishMishra
Differential Revision: https://reviews.llvm.org/D123186
Add a helper used to implement the build methods generated by ods-gen. The change reduces code size and compilation time since all structured op builders use the same build method. The change reduces the LinalgOps.cpp compilation time from 10.2s to 9.8s (debug build).
Depends On D123987
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D124003
The revision avoids template methods for parsing and printing that are replicated for every named operation. Instead, the new methods take a regionBuilder argument. The revision reduces the compile time of LinalgOps.cpp from 11.2 to 10.2 seconds (debug build).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D123987
NFC. Drop trailing end of line white space in GPU async ops' printer
whenever the list of async deps is empty.
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D123754
Add RegionBranchOpInterface on affine.for op so that transforms relying
on RegionBranchOpInterface can support affine.for. E.g.:
buffer-deallocation pass.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D123568
Writes into tensors that are definied outside of a repetitive region, but with the write happening inside of the repetitive region were previously not considered conflicts. This was incorrect.
E.g.:
```
%0 = ... : tensor<?xf32>
scf.for ... {
"reading_op"(%0) : tensor<?xf32>
%1 = "writing_op"(%0) : tensor<?xf32> -> tensor<?xf32>
...
}
```
In the above example, "writing_op" should be out-of-place.
This commit fixes the bufferization for any op that declares its repetitive semantics via RegionBranchOpInterface.
This patch adds check of supported reduction kind for ScanOp to avoid using and/or/xor for floating point type.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D123977
Introduce a method on PyMlirContext (and plumb it through to Python) to
invalidate all of the operations in the live operations map and clear
it. Since Python has no notion of private data, an end-developer could
reach into some 3rd party API which uses the MLIR Python API (that is
behaving correctly with regard to holding references) and grab a
reference to an MLIR Python Operation, preventing it from being
deconstructed out of the live operations map. This allows the API
developer to clear the map when it calls C++ code which could delete
operations, protecting itself from its users.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D123895
getUpperBound is analogous to getLowerBound(), except for the upper
bound, and is used in range analysis.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D124020
Sequence is an important transform combination primitive that just indicates
transform ops being applied in a row. The simplest version requires fails
immediately if any transformation in the sequence fails. Introducing this
operation allows one to start placing transform IR within other IR.
Depends On D123135
Reviewed By: Mogball, rriddle
Differential Revision: https://reviews.llvm.org/D123664
This patch adds a new function `mlirDenseElementsAttrBFloat16Get()`,
which accepts the shaped type, the number of BFloat16 values, and a
pointer to an array of BFloat16 values, each of which is a `uint16_t`
value.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D123981
The printer is now resilient to invalid IR and will already automatically
fallback to the generic form on invalid IR. Using the generic printer on
pass failure was a conservative option before the printer was made
failsafe.
Reviewed By: lattner, rriddle, jpienaar, bondhugula
Differential Revision: https://reviews.llvm.org/D123915
Fold away gpu.memcpy op when only uses of dest are
the memcpy op in question, its allocation and deallocation
ops.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D121279
Add helper functions to check if an op may be executed multiple times based on RegionBranchOpInterface.
Differential Revision: https://reviews.llvm.org/D123789
This reverts commit af0285122f.
The test "libomp::loop_dispatch.c" on builder
openmp-gcc-x86_64-linux-debian fails from time-to-time.
See #54969. This patch is unrelated.