In cases where arithmetic (addi/muli) ops are performed on an scf.for loops induction variable with a single use, we can fold those ops directly into the scf.for loop.
For example, in the following code:
```
scf.for %i = %c0 to %arg1 step %c1 {
%0 = addi %arg2, %i : index
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
we can lift `%0` up into the scf.for loop range, as it is the only user of %i:
```
%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
Reviewed By: mehdi_amini, ftynse, Anthony
Differential Revision: https://reviews.llvm.org/D104289
The patch changes the pretty printed FillOp operand order from output, value to value, output. The change is a follow up to https://reviews.llvm.org/D104121 that passes the fill value using a scalar input instead of the former capture semantics.
Differential Revision: https://reviews.llvm.org/D104356
During slice computation of affine loop fusion, detect one id as the mod
of another id w.r.t a constant in a more generic way. Restrictions on
co-efficients of the ids is removed. Also, information from the
previously calculated ids is used for simplification of affine
expressions, e.g.,
If `id1` = `id2`,
`id_n - divisor * id_q - id_r + id1 - id2 = 0`, is simplified to:
`id_n - divisor * id_q - id_r = 0`.
If `c` is a non-zero integer,
`c*id_n - c*divisor * id_q - c*id_r = 0`, is simplified to:
`id_n - divisor * id_q - id_r = 0`.
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D104614
Correct a documentation typo, and delete a duplicate test in
`pdl-to-pdl-interp-rewriter.mlir`.
Reviewed By: pr4tgpt, bondhugula, rriddle
Differential Revision: https://reviews.llvm.org/D104688
This revision refactors the usage of multithreaded utilities in MLIR to use a common
thread pool within the MLIR context, in addition to a new utility that makes writing
multi-threaded code in MLIR less error prone. Using a unified thread pool brings about
several advantages:
* Better thread usage and more control
We currently use the static llvm threading utilities, which do not allow multiple
levels of asynchronous scheduling (even if there are open threads). This is due to
how the current TaskGroup structure works, which only allows one truly multithreaded
instance at a time. By having our own ThreadPool we gain more control and flexibility
over our job/thread scheduling, and in a followup can enable threading more parts of
the compiler.
* The static nature of TaskGroup causes issues in certain configurations
Due to the static nature of TaskGroup, there have been quite a few problems related to
destruction that have caused several downstream projects to disable threading. See
D104207 for discussion on some related fallout. By having a ThreadPool scoped to
the context, we don't have to worry about destruction and can ensure that any
additional MLIR thread usage ends when the context is destroyed.
Differential Revision: https://reviews.llvm.org/D104516
Slowly we are moving toward full support of sparse tensor *outputs*. First
step was support for all-dense annotated "sparse" tensors. This step adds
support for truly sparse tensors, but only for operations in which the values
of a tensor change, but not the nonzero structure (this was refered to as
"simply dynamic" in the [Bik96] thesis).
Some background text was posted on discourse:
https://llvm.discourse.group/t/sparse-tensors-in-mlir/3389/25
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D104577
Operations currently rely on the string name of attributes during attribute lookup/removal/replacement, in build methods, and more. This unfortunately means that some of the most used APIs in MLIR require string comparisons, additional hashing(+mutex locking) to construct Identifiers, and more. This revision remedies this by caching identifiers for all of the attributes of the operation in its corresponding AbstractOperation. Just updating the autogenerated usages brings up to a 15% reduction in compile time, greatly reducing the cost of interacting with the attributes of an operation. This number can grow even higher as we use these methods in handwritten C++ code.
Methods for accessing these cached identifiers are exposed via `<attr-name>AttrName` methods on the derived operation class. Moving forward, users should generally use these methods over raw strings when an attribute name is necessary.
Differential Revision: https://reviews.llvm.org/D104167
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename SubTensorOp -> tensor.extract_slice, SubTensorInsertOp -> tensor.insert_slice.
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Note: This is a fixed version of https://reviews.llvm.org/D104499, which was reverted due to a missing update to two CMakeFile.txt.
Differential Revision: https://reviews.llvm.org/D104676
Adapt the FillOp definition to use a scalar operand instead of a capture. This patch is a follow up to https://reviews.llvm.org/D104109. As the input operands are in front of the output operands the patch changes the internal operand order of the FillOp. The pretty printed version of the operation remains unchanged though. The patch also adapts the linalg to standard lowering to ensure the c signature of the FillOp remains unchanged as well.
Differential Revision: https://reviews.llvm.org/D104121
The approximation relays on range reduced version y \in [0, pi/2]. An input x will have
the property that sin(x) = sin(y), -sin(y), cos(y), -cos(y) depends on which quadrable x
is in, where sin(y) and cos(y) are approximated with 5th degree polynomial (of x^2).
As a result a single pattern can be used to compute approximation for both sine and cosine.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D104582
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename ops: SubTensorOp --> ExtractTensorOp, SubTensorInsertOp --> InsertTensorOp
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Differential Revision: https://reviews.llvm.org/D104499
* Remove dependency: Standard --> MemRef
* Add dependencies: GPUToNVVMTransforms --> MemRef, Linalg --> MemRef, MemRef --> Tensor
* Note: The `subtensor_insert_propagate_dest_cast` test case in MemRef/canonicalize.mlir will be moved to Tensor/canonicalize.mlir in a subsequent commit, which moves over the remaining Tensor ops from the Standard dialect to the Tensor dialect.
Differential Revision: https://reviews.llvm.org/D104506
This revision adds a BufferizationAliasInfo which maintains and updates information about which tensors will alias once bufferized, which bufferized tensors are equivalent to others and how to handle clobbers.
Bufferization greedily tries to bufferize inplace by:
1. first trying to bufferize SubTensorInsertOp inplace, in reverse order (these are deemed the most expensives).
2. then trying to bufferize all non SubTensorOp / SubTensorInsertOp, in reverse order.
3. lastly trying to bufferize all SubTensorOp in reverse order.
Reverse order is a heuristic that seems to work nicely because structured tensor codegen very often proceeds by:
1. take a subset of a tensor
2. compute on that subset
3. insert the result subset into the full tensor and yield a new tensor.
BufferizationAliasInfo + equivalence sets + clobber analysis allows bufferizing nested
subtensor/compute/subtensor_insert sequences inplace to a certain extent.
To fully realize inplace bufferization, additional container-containee analysis will be necessary and is left for a subsequent commit.
Differential revision: https://reviews.llvm.org/D104110
This revision adds support for passing a functor to SourceMgrDiagnosticHandler for filtering out FileLineColLocs when emitting a diagnostic. More specifically, this can be useful in situations where there may be large CallSiteLocs with locations that aren't necessarily important/useful for users.
For now the filtering support is limited to FileLineColLocs, but conceptually we could allow filtering for all locations types if a need arises in the future.
Differential Revision: https://reviews.llvm.org/D103649
Introduce the execute_region op that is able to hold a region which it
executes exactly once. The op encapsulates a CFG within itself while
isolating it from the surrounding control flow. Proposal discussed here:
https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
execute_region enables one to inline a function without lowering out all
other higher level control flow constructs (affine.for/if, scf.for/if)
to the flat list of blocks / CFG form. It thus allows the benefit of
transforms on higher level control flow ops available in the presence of
the inlined calls. The inlined calls continue to benefit from
propagation of SSA values across their top boundary. Functions won’t
have to remain outlined until later than desired. Abstractions like
affine execute_regions, lambdas with implicit captures could be lowered
to this without first lowering out structured loops/ifs or outlining.
But two potential early use cases are of: (1) an early inliner (which
can inline functions by introducing execute_region ops), (2) lowering of
an affine.execute_region, which cleanly maps to an scf.execute_region
when going from the affine dialect to the scf dialect.
Differential Revision: https://reviews.llvm.org/D75837
This is similar to attribute and type interfaces and mostly the same mechanism
(FallbackModel / ExternalModel, ODS generation). There are minor differences in
how the concept-based polymorphism is implemented for operations that are
accounted for by ODS backends, and this essentially adds a test and exposes the
API.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104294
Based on dicussion in
[this](https://llvm.discourse.group/t/remove-canonicalizer-for-memref-dim-via-shapedtypeopinterface/3641)
thread the pattern to resolve the `memref.dim` of a value that is a
result of an operation that implements the
`InferShapedTypeOpInterface` is moved to a separate pass instead of
running it as a canonicalization pass. This allows shape resolution to
happen when explicitly required, instead of automatically through a
canonicalization.
Differential Revision: https://reviews.llvm.org/D104321
This patch changes the (not recommended) static registration API from:
static PassRegistration<MyPass> reg("my-pass", "My Pass Description.");
to:
static PassRegistration<MyPass> reg;
And the explicit registration from:
void registerPass("my-pass", "My Pass Description.",
[] { return createMyPass(); });
To:
void registerPass([] { return createMyPass(); });
It is expected that Pass implementations overrides the getArgument() method
instead. This will ensure that pipeline description can be printed and parsed
back.
Differential Revision: https://reviews.llvm.org/D104421
Adds an integration test for the SPMM (sparse matrix multiplication) kernel, which multiplies a sparse matrix by a dense matrix, resulting in a dense matrix. This is just a simple modification on the existing matrix-vector multiplication kernel.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D104334
Make store to load fwd condition for -memref-dataflow-opt less
conservative. Post dominance info is not really needed. Add additional
check for common cases.
Differential Revision: https://reviews.llvm.org/D104174
To control the number of outer parallel loops, we need to process the
outer loops first and hence pre-order walk fixes the issue.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D104361
This allows for dialects to do different post-processing depending on operations with the inliner (my use case requires different attribute propagation rules depending on call op). This hook runs before the regular processInlinedBlocks method.
Differential Revision: https://reviews.llvm.org/D104399
In a region with multiple blocks the verifier will try to look for
dominance and may get successor list for blocks, even though a block
may be empty or does not end with a terminator.
Differential Revision: https://reviews.llvm.org/D104411
We have several ways of introducing a scalar invariant value into
linalg generic ops (should we limit this somewhat?). This revision
makes sure we handle all of them correctly in the sparse compiler.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D104335
This is a very careful start with alllowing sparse tensors at the
left-hand-side of tensor index expressions (viz. sparse output).
Note that there is a subtle difference between non-annotated tensors
(dense, remain n-dim, handled by classic bufferization) and all-dense
annotated "sparse" tensors (linearized to 1-dim without overhead
storage, bufferized by sparse compiler, backed by runtime support library).
This revision gently introduces some new IR to facilitate annotated outputs,
to be generalized to truly sparse tensors in the future.
Reviewed By: gussmith23, bixia
Differential Revision: https://reviews.llvm.org/D104074
The index cast operation accepts vector types. Implement its lowering in this patch.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D104280
It may be desirable to provide an interface implementation for an attribute or
a type without modifying the definition of said attribute or type. Notably,
this allows to implement interfaces for attributes and types outside of the
dialect that defines them and, in particular, provide interfaces for built-in
types. Provide the mechanism to do so.
Currently, separable registration requires the attribute or type to have been
registered with the context, i.e. for the dialect containing the attribute or
type to be loaded. This can be relaxed in the future using a mechanism similar
to delayed dialect interface registration.
See https://llvm.discourse.group/t/rfc-separable-attribute-type-interfaces/3637
Depends On D104233
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D104234
The patch replaces the existing capture functionality by scalar operands that have been introduced by https://reviews.llvm.org/D104109. Scalar operands behave as tensor operands except for the fact that they are not indexed. As a result ScalarDefs can be accessed directly as no indexing expression is needed.
The patch only updates the OpDSL. The C++ side is updated by a follow up patch.
Differential Revision: https://reviews.llvm.org/D104220
This doesn't add any canonicalizations, but executes the same
simplification on bufferSemantic linalg.generic ops by using
linalg::ReshapeOp instead of linalg::TensorReshapeOp.
Differential Revision: https://reviews.llvm.org/D103513
This is useful for "build tuple" type ops. In my case, in npcomp, I have
an op:
```
// Result type is `!torch.tuple<!torch.tensor, !torch.tensor>`.
torch.prim.TupleConstruct %0, %1 : !torch.tensor, !torch.tensor
```
and the context is required for the `Torch::TupleType::get` call (for
the case of an empty tuple).
The handling of these FmtContext's in the code is pretty ad-hoc -- I didn't
attempt to rationalize it and just made a targeted fix. As someone
unfamiliar with the code I had a hard time seeing how to more broadly fix
the situation.
Differential Revision: https://reviews.llvm.org/D104274
The parser of generic op did not recognize the output from mlir-opt when there
are multiple outputs. One would wrap the result types with braces, and one would
not. The patch makes the behavior the same.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D104256
Up to now all structured op operands are assumed to be shaped. The patch relaxes this assumption and allows scalar input operands. In contrast to shaped operands scalar operands are not indexed and directly forwarded to the body of the operation. As all other operands, scalar operands are associated to an indexing map that in case of a scalar or a 0D-operand has an empty range.
We will use scalar operands as a replacement for the capture mechanism. In contrast to captures, the approach ensures we can generate the function signature from the operand list and it prevents outdated capture values in case a transformation updates only the capture operand but not the hidden body of a named operation.
Removing captures and updating existing operations such as linalg.fill is left for a later patch.
The patch depends on https://reviews.llvm.org/D103891 and https://reviews.llvm.org/D103890.
Differential Revision: https://reviews.llvm.org/D104109
The padding of such ops is not generated in a vectorized way. Instead, emit a tensor::GenerateOp.
We may vectorize GenerateOps in the future.
Differential Revision: https://reviews.llvm.org/D103879
If the source operand of a linalg.pad_op operation has static shape, vectorize the copying of the source.
Differential Revision: https://reviews.llvm.org/D103747
Currently limited to constant pad values. Any combination of dynamic/static tensor sizes and padding sizes is supported.
Differential Revision: https://reviews.llvm.org/D103679
The generic vectorization pattern handles only those cases, where
low and high padding is zero. This is already handled by a
canonicalization pattern.
Also add a new canonicalization test case to ensure that tensor cast ops
are properly inserted.
A more general vectorization pattern will be added in a subsequent commit.
Differential Revision: https://reviews.llvm.org/D103590
Vectorize linalg.pad_tensor without generating a linalg.init_tensor when consumed by a transfer_write.
Differential Revision: https://reviews.llvm.org/D103137
Vectorize linalg.pad_tensor without generating a linalg.init_tensor when consumed by a subtensor_insert.
Differential Revision: https://reviews.llvm.org/D103780
Vectorize linalg.pad_tensor without generating a linalg.init_tensor when consumed by a transfer_read.
Differential Revision: https://reviews.llvm.org/D103735
Add `tensor.insert` op to make `tensor.extract`/`tensor.insert` work in pairs
for `scalar` domain. Like `subtensor`/`subtensor_insert` work in pairs in
`tensor` domain, and `vector.transfer_read`/`vector.transfer_write` work in
pairs in `vector` domain.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D104139
The commit simplifies affine.if ops :
The affine if operation gets removed if the condition is universally true or false and then/else block is merged with the parent block.
Signed-off-by: Shashij Gupta shashij.gupta@polymagelabs.com
Reviewed By: bondhugula, pr4tgpt
Differential Revision: https://reviews.llvm.org/D104015
Add support to Python bindings for the MLIR execution engine to load a
specified list of shared libraries - for eg. to use MLIR runtime
utility libraries.
Differential Revision: https://reviews.llvm.org/D104009
## Introduction
This proposal describes the new op to be added to the `std` (and later moved `memref`)
dialect called `alloca_scope`.
## Motivation
Alloca operations are easy to misuse, especially if one relies on it while doing
rewriting/conversion passes. For example let's consider a simple example of two
independent dialects, one defines an op that wants to allocate on-stack and
another defines a construct that corresponds to some form of looping:
```
dialect1.looping_op {
%x = dialect2.stack_allocating_op
}
```
Since the dialects might not know about each other they are going to define a
lowering to std/scf/etc independently:
```
scf.for … {
%x_temp = std.alloca …
… // do some domain-specific work using %x_temp buffer
… // and store the result into %result
%x = %result
}
```
Later on the scf and `std.alloca` is going to be lowered to llvm using a
combination of `llvm.alloca` and unstructured control flow.
At this point the use of `%x_temp` is bound to either be either optimized by
llvm (for example using mem2reg) or in the worst case: perform an independent
stack allocation on each iteration of the loop. While the llvm optimizations are
likely to succeed they are not guaranteed to do so, and they provide
opportunities for surprising issues with unexpected use of stack size.
## Proposal
We propose a new operation that defines a finer-grain allocation scope for the
alloca-allocated memory called `alloca_scope`:
```
alloca_scope {
%x_temp = alloca …
...
}
```
Here the lifetime of `%x_temp` is going to be bound to the narrow annotated
region within `alloca_scope`. Moreover, one can also return values out of the
alloca_scope with an accompanying `alloca_scope.return` op (that behaves
similarly to `scf.yield`):
```
%result = alloca_scope {
%x_temp = alloca …
…
alloca_scope.return %myvalue
}
```
Under the hood the `alloca_scope` is going to lowered to a combination of
`llvm.intr.stacksave` and `llvm.intr.strackrestore` that are going to be invoked
automatically as control-flow enters and leaves the body of the `alloca_scope`.
The key value of the new op is to allow deterministic guaranteed stack use
through an explicit annotation in the code which is finer-grain than the
function-level scope of `AutomaticAllocationScope` interface. `alloca_scope`
can be inserted at arbitrary locations and doesn’t require non-trivial
transformations such as outlining.
## Which dialect
Before memref dialect is split, `alloca_scope` can temporarily reside in `std`
dialect, and later on be moved to `memref` together with the rest of
memory-related operations.
## Implementation
An implementation of the op is available [here](https://reviews.llvm.org/D97768).
Original commits:
* Add initial scaffolding for alloca_scope op
* Add alloca_scope.return op
* Add no region arguments and variadic results
* Add op descriptions
* Add failing test case
* Add another failing test
* Initial implementation of lowering for std.alloca_scope
* Fix backticks
* Fix getSuccessorRegions implementation
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D97768
This is the first step to convert vector ops to MMA operations in order to
target GPUs tensor core ops. This currently only support simple cases,
transpose and element-wise operation will be added later.
Differential Revision: https://reviews.llvm.org/D102962
This allows for better interaction with tools (such as mlir-lsp-server), as it separates the IR into separate modules for consecutive dumps.
Differential Revision: https://reviews.llvm.org/D104073
This adds Sdot2d op, which is similar to the usual Neon
intrinsic except that it takes 2d vector operands, reflecting the
structure of the arithmetic that it's performing: 4 separate
4-dimensional dot products, whence the vector<4x4xi8> shape.
This also adds a new pass, arm-neon-2d-to-intr, lowering
this new 2d op to the 1d intrinsic.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D102504
This allows for building an outline of the symbols and symbol tables within the IR. This allows for easy navigations to functions/modules and other symbol/symbol table operations within the IR.
Differential Revision: https://reviews.llvm.org/D103729
This allow creating a matrix with all elements set to a given value. This is
needed to be able to implement a simple dot op.
Differential Revision: https://reviews.llvm.org/D103870
This brings us closer to replacing the LLVM data layout string with a
first-class layout modeling in MLIR.
Depends On D103945
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D103946
Allow gpu ops implementing the async interface to already be async when running the GpuAsyncRegionPass.
That pass threads a 'current token' through a block with ops implementing the gpu async interface.
After this change, existing async ops (returning a !gpu.async.token) set the current token.
Existing synchronous `gpu.wait` ops reset the current token.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D103396
tosa.matmul is a batched matmul, update the lowering for linalg
with the tests.
Reviewed By: sjarus
Differential Revision: https://reviews.llvm.org/D103937
This allows us to remove the `spv.mlir.endmodule` op and
all the code associated with it.
Along the way, tightened the APIs for `spv.module` a bit
by removing some aliases. Now we use `getRegion` to get
the only region, and `getBody` to get the region's only
block.
Reviewed By: mravishankar, hanchung
Differential Revision: https://reviews.llvm.org/D103265
The top-level verifier of data layout specifications delegates verification of
entries with identifier keys to the dialect of the identifier prefix. This flow
was missing a check whether the dialect actually implements the relevant
interface.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D103945
ArmSVE-specific memory operations are needed to generate end-to-end
code for as long as MLIR core doesn't support scalable vectors. This
instructions will be eventually unnecessary, for now they're required
for more complex testing.
Differential Revision: https://reviews.llvm.org/D103535
These `arm_sve.cmp` functions are needed to generate scalable vector
masks as long as scalable vectors are not part of the standard types.
Once in standard, these can be removed and `std.cmp` can be used
instead.
Differential Revision: https://reviews.llvm.org/D103473
LLVM Dialect uses builtin-integer types. The existing LLVM_AnyInteger
type constraint is a dupe of AnyInteger. This patch removes LLVM_AnyInteger
and replaces all usage with AnyInteger.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103839
Currently canonicalizations of a store and a cast try to fold all casts into the store.
In the case where the operand being stored is itself a cast, this is illegal as the type of the value being stored
will change. This PR fixes this by not checking the value for folding with a cast.
Depends on https://reviews.llvm.org/D103828
Differential Revision: https://reviews.llvm.org/D103829
Now that memref supports arbitrary element types, add support for memref of
memref and make sure it is properly converted to the LLVM dialect. The type
support itself avoids adding the interface to the memref type itself similarly
to other built-in types. This allows the shape, and therefore byte size, of the
memref descriptor to remain a lowering aspect that is easier to customize and
evolve as opposed to sanctifying it in the data layout specification for the
memref type itself.
Factor out the code previously in a testing pass to live in a dedicated data
layout analysis and use that analysis in the conversion to compute the
allocation size for memref of memref. Other conversions will be ported
separately.
Depends On D103827
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D103828
Historically, MemRef only supported a restricted list of element types that
were known to be storable in memory. This is unnecessarily restrictive given
the open nature of MLIR's type system. Allow types to opt into being used as
MemRef elements by implementing a type interface. For now, the interface is
merely a declaration with no methods. Later, methods to query, e.g., the type
size or whether a type can alias elements of another type may be added.
Harden the "standard"-to-LLVM conversion against memrefs with non-builtin
types.
See https://llvm.discourse.group/t/rfc-memref-of-custom-types/3558.
Depends On D103826
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D103827
Some places in the alloc-like op conversion use the converted index type
whereas other places use the pointer-sized integer type, which may not be the
same. Consistently use the converted index type, similarly to other address
calculations.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D103826
These `arm_sve.cmp` functions are needed to generate scalable vector
masks as long as scalable vectors are not part of the standard types.
Once in standard, these can be removed and `std.cmp` can be used
instead.
Differential Revision: https://reviews.llvm.org/D103473
We were accidentally only using the first found reference, instead of all of them. This revision fixes this by properly tracking all references to a symbol.
Differential Revision: https://reviews.llvm.org/D103730
For now the hover simply shows the same information as hovering on the operation
name. If necessary this can be tweaked to something symbol specific later.
Differential Revision: https://reviews.llvm.org/D103728
This revision adds support for hover on region operations, by temporarily removing the regions during printing. This revision also tweaks the hover format for operations to include symbol information, now that FuncOp can be shown in the hover.
Differential Revision: https://reviews.llvm.org/D103727
In an operation in the true/false dest of a branch,
one can assume that the operation itself was true/false if
only that edge can reach the operation.
Differential Revision: https://reviews.llvm.org/D101709
This patch convert the if condition on standalone data operation such as acc.update,
acc.enter_data and acc.exit_data to a scf.if with the operation in the if region.
It removes the operation when the if condition is constant and false. It removes the
the condition if it is contant and true.
Conversion to scf.if is done in order to use the translation to LLVM IR dialect out of the box.
Not sure this is the best approach or we should perform this during the translation from OpenACC
to LLVM IR dialect. Any thoughts welcome.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103325
This patch add canonicalization for the standalone data operation with constant if condition.
It is extracted from this patch D103325.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103712
Convert data operands from the acc.parallel operation using the same conversion pattern than D102170.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103337
Implements better naming for results of spv.mlir.addressof ops by making it
inherit from OpAsmOpInterface and implementing the associated
getAsmResultName(...) hook.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D103594
Controlled by a compiler option, if 32-bit indices can be handled
with zero/sign-extention alike (viz. no worries on non-negative
indices), scatter/gather operations can use the more efficient
32-bit SIMD version.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D103632
* Rename PadTensorOpVectorizationPattern to GenericPadTensorOpVectorizationPattern.
* Make GenericPadTensorOpVectorizationPattern a private pattern, to be instantiated via populatePadTensorOpVectorizationPatterns.
* Factor out parts of PadTensorOpVectorizationPattern into helper functions.
This commit prepares PadTensorOpVectorizationPattern for a series of subsequent commits that add more specialized PadTensorOp vectorization patterns.
Differential Revision: https://reviews.llvm.org/D103681
Convert data operands from the acc.data operation using the same conversion pattern than D102170.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103332
This revision adds assembly state tracking for uses of symbols, allowing for go-to-definition and references support for SymbolRefAttrs.
Differential Revision: https://reviews.llvm.org/D103585
Introduces a test pass that rewrites PadTensorOps with static shapes as a sequence of:
```
linalg.init_tensor // to create output
linalg.fill // to initialize with padding value
linalg.generic // to copy the original contents to the padded tensor
```
The pass can be triggered with:
- `--test-linalg-transform-patterns="test-transform-pad-tensor"`
Differential Revision: https://reviews.llvm.org/D102804
Replace the uses of deprecated Structured Op Interface methods in TestLinalgElementwiseFusion.cpp, TestLinalgFusionTransforms.cpp, and Transforms.cpp. The patch is based on https://reviews.llvm.org/D103394.
Differential Revision: https://reviews.llvm.org/D103528
This simplifies various pieces of code that interact with the pass registry, e.g. this removes the need to register passes to get accurate pass pipelines descriptions when generating crash reproducers.
Differential Revision: https://reviews.llvm.org/D101880
This revision allows for attaching "debug labels" to patterns, and provides to FrozenRewritePatternSet for filtering patterns based on these labels (in addition to the debug name of the pattern). This will greatly simplify the ability to write tests targeted towards specific patterns (in cases where many patterns may interact), will also simplify debugging pattern application by observing how application changes when enabling/disabling specific patterns.
To enable better reuse of pattern rewrite options between passes, this revision also adds a new PassUtil.td file to the Rewrite/ library that will allow for passes to easily hook into a common interface for pattern debugging. Two options are used to seed this utility, `disable-patterns` and `enable-patterns`, which are used to enable the filtering behavior indicated above.
Differential Revision: https://reviews.llvm.org/D102441
Currently the diagnostics reports the file:line:col, but some LSP
frontends require a non-empty range. Report either the range of an
identifier that starts at location, or a range of 1. Expose the id
location to range helper and reuse here.
Differential Revision: https://reviews.llvm.org/D103482
When LLVM and MLIR are built as subprojects (via add_subdirectory),
the CMake configuration that indicates where the MLIR libraries are is
not necessarily in the same cmake/ directory as LLVM's configuration.
This patch removes that assumption about where MLIRConfig.cmake is
located.
(As an additional none, the %llvm_lib_dir substitution was never
defined, and so find_package(MLIR) in the build was succeeding for
other reasons.)
Reviewed By: stephenneuendorffer
Differential Revision: https://reviews.llvm.org/D103276
Support for tensor types in the unrolled version will follow in a separate commit.
Add a new pass option to activate lowering of transfer ops with tensor types (default: deactivated).
Differential Revision: https://reviews.llvm.org/D102666
* A Reducer is a kind of RewritePattern, so it's just the same as
writing graph rewrite.
* ReductionTreePass operates on Operation rather than ModuleOp, so that
* we are able to reduce a nested structure(e.g., module in module) by
* self-nesting.
Reviewed By: jpienaar, rriddle
Differential Revision: https://reviews.llvm.org/D101046
Depthwise convolution should support kernel dilation and non-dilation should
not be a special case. Updated op definition to include a dilation attribute.
This also adds a tosa.depthwise_conv2d lowering to linalg to support the new
linalg behavior.
Differential Revision: https://reviews.llvm.org/D103219
Implements better naming for results of `spv.Constant` ops by making it
inherit from OpAsmOpInterface and implementing the associated
getAsmResultName(...) hook.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D103152
MLIR tools very commonly use `// -----` to split a file into distinct sub documents, that are processed separately. This revision adds support to mlir-lsp-server for splitting MLIR files based on this sigil, and processing them separately.
Differential Revision: https://reviews.llvm.org/D102660
This allows for checking if a given operation may modify/reference/or both a given value. Right now this API is limited to Value based memory locations, but we should expand this to include attribute based values at some point. This is left for future work because the rest of the AliasAnalysis API also has this restriction.
Differential Revision: https://reviews.llvm.org/D101673
Depends On D103109
If any of the tokens/values added to the `!async.group` switches to the error state, than the group itself switches to the error state.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103203
Depends On D103102
Not yet implemented:
1. Error handling after synchronous await
2. Error handling for async groups
Will be addressed in the followup PRs
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103109
Support reference counted values implicitly passed (live) only to some of the successors.
Example: if branched to ^bb2 token will leak, unless `drop_ref` operation is properly created
```
^entry:
%token = async.runtime.create : !async.token
cond_br %cond, ^bb1, ^bb2
^bb1:
async.runtime.await %token
async.runtime.drop_ref %token
br ^bb2
^bb2:
return
```
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103102
In order to allow large matmul operations using the MMA ops we need to chain
operations this is not possible unless "DOp" and "COp" type have matching
layout so remove the "DOp" layout and force accumulator and result type to
match.
Added a test for the case where the MMA value is accumulated.
Differential Revision: https://reviews.llvm.org/D103023
This revision refactors and simplifies the pattern detection logic: thanks to SSA value properties, we can actually look at all the uses of a given value and avoid having to pattern-match specific chains of operations.
A bufferization pattern for subtensor is added and specific inplaceability analysis is implemented for the simple case of subtensor. More advanced use cases will follow.
Differential revision: https://reviews.llvm.org/D102512
* Add `hasCanonicalizer` option to Dialect.
* Initialize canonicalizer with dialect-wide canonicalization patterns.
* Add test case to TestDialect.
Dialect-wide canonicalization patterns are useful if a canonicalization pattern does not conceptually associate with any single operation, i.e., it should not be registered as part of an operation's `getCanonicalizationPatterns` function. E.g., this is the case for canonicalization patterns that match an op interface.
Differential Revision: https://reviews.llvm.org/D103226
Allow support for specifying empty IVs in an `affine.parallel`.
For example:
```
affine.parallel () = () to () {
affine.yield
}
```
Reviewed By: bondhugula, jbruestle
Differential Revision: https://reviews.llvm.org/D102895
The casting ops (sitofp, uitofp, fptosi, fptoui) lowering currently does
not handle n-D vectors. This patch fixes that.
Differential Revision: https://reviews.llvm.org/D103207
Add translation to LLVM IR for the UpdateOp with host and device operands.
Translation is done with call using the runtime. This is done in a similar way as
D101504 and D102381.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D102382
Indexed Generic should be going away in the future. Migrate to linalg.index.
Reviewed By: NatashaKnk, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D103110
This provides a sizable compile time improvement by seeding
the worklist in an order that leads to less iterations of the
worklist.
This patch only changes the behavior of the Canonicalize pass
itself, it does not affect other passes that use the
GreedyPatternRewrite driver
Differential Revision: https://reviews.llvm.org/D103053
Removed some of the older raw "MLIRized" versions that are
no longer needed now that the sparse runtime support library
can focus on the proper sparse tensor types rather than the
opague pointer approach of the past. This avoids legacy...
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D102960
A test in ir.c makes use of casting a void* to an integer type to print it's address. This cast is currently done with the datatype `long` however, which is only guaranteed to be equal to the pointer width on LP64 system. Other platforms may use a length not equal to the pointer width. 64bit Windows as an example uses 32 bit for `long` which does not match the 64 bit pointers.
This also results in clang warning due to `-Wvoid-pointer-to-int-cast`.
Technically speaking, since the test only passes the value 42, it does not cause any issues, but it'd be nice to fix the warning at least.
Differential Revision: https://reviews.llvm.org/D103085