After the Math has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the Math
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D105702
Use a modeling similar to SCF ParallelOp to support arbitrary parallel
reductions. The two main differences are: (1) reductions are named and declared
beforehand similarly to functions using a special op that provides the neutral
element, the reduction code and optionally the atomic reduction code; (2)
reductions go through memory instead because this is closer to the OpenMP
semantics.
See https://llvm.discourse.group/t/rfc-openmp-reduction-support/3367.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D105358
After the MemRef has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the MemRef
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: herhut, silvas
Differential Revision: https://reviews.llvm.org/D105625
Simplify vector unrolling pattern to be more aligned with rest of the
patterns and be closer to vector distribution.
The new implementation uses ExtractStridedSlice/InsertStridedSlice
instead of the Tuple ops. After this change the ops based on Tuple don't
have any more used so they can be removed.
This allows removing signifcant amount of dead code and will allow
extending the unrolling code going forward.
Differential Revision: https://reviews.llvm.org/D105381
Added InferReturnTypeComponents for NAry operations, reshape, and reverse.
With the additional tosa-infer-shapes pass, we can infer/propagate shapes
across a set of TOSA operations. Current version does not modify the
FuncOp type by inserting an unrealized conversion cast prior to any new
non-matchin returns.
Differential Revision: https://reviews.llvm.org/D105312
* Split memref.dim into two operations: memref.dim and tensor.dim. Both ops have the same builder interface and op argument names, so that they can be used with templates in patterns that apply to both tensors and memrefs (e.g., some patterns in Linalg).
* Add constant materializer to TensorDialect (needed for folding in affine.apply etc.).
* Remove some MemRefDialect dependencies, make some explicit.
Differential Revision: https://reviews.llvm.org/D105165
Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group.
Reviewed By: ftynse, herhut
Differential Revision: https://reviews.llvm.org/D104780
The patch changes the pretty printed FillOp operand order from output, value to value, output. The change is a follow up to https://reviews.llvm.org/D104121 that passes the fill value using a scalar input instead of the former capture semantics.
Differential Revision: https://reviews.llvm.org/D104356
Correct a documentation typo, and delete a duplicate test in
`pdl-to-pdl-interp-rewriter.mlir`.
Reviewed By: pr4tgpt, bondhugula, rriddle
Differential Revision: https://reviews.llvm.org/D104688
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename SubTensorOp -> tensor.extract_slice, SubTensorInsertOp -> tensor.insert_slice.
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Note: This is a fixed version of https://reviews.llvm.org/D104499, which was reverted due to a missing update to two CMakeFile.txt.
Differential Revision: https://reviews.llvm.org/D104676
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename ops: SubTensorOp --> ExtractTensorOp, SubTensorInsertOp --> InsertTensorOp
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Differential Revision: https://reviews.llvm.org/D104499
The index cast operation accepts vector types. Implement its lowering in this patch.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D104280
## Introduction
This proposal describes the new op to be added to the `std` (and later moved `memref`)
dialect called `alloca_scope`.
## Motivation
Alloca operations are easy to misuse, especially if one relies on it while doing
rewriting/conversion passes. For example let's consider a simple example of two
independent dialects, one defines an op that wants to allocate on-stack and
another defines a construct that corresponds to some form of looping:
```
dialect1.looping_op {
%x = dialect2.stack_allocating_op
}
```
Since the dialects might not know about each other they are going to define a
lowering to std/scf/etc independently:
```
scf.for … {
%x_temp = std.alloca …
… // do some domain-specific work using %x_temp buffer
… // and store the result into %result
%x = %result
}
```
Later on the scf and `std.alloca` is going to be lowered to llvm using a
combination of `llvm.alloca` and unstructured control flow.
At this point the use of `%x_temp` is bound to either be either optimized by
llvm (for example using mem2reg) or in the worst case: perform an independent
stack allocation on each iteration of the loop. While the llvm optimizations are
likely to succeed they are not guaranteed to do so, and they provide
opportunities for surprising issues with unexpected use of stack size.
## Proposal
We propose a new operation that defines a finer-grain allocation scope for the
alloca-allocated memory called `alloca_scope`:
```
alloca_scope {
%x_temp = alloca …
...
}
```
Here the lifetime of `%x_temp` is going to be bound to the narrow annotated
region within `alloca_scope`. Moreover, one can also return values out of the
alloca_scope with an accompanying `alloca_scope.return` op (that behaves
similarly to `scf.yield`):
```
%result = alloca_scope {
%x_temp = alloca …
…
alloca_scope.return %myvalue
}
```
Under the hood the `alloca_scope` is going to lowered to a combination of
`llvm.intr.stacksave` and `llvm.intr.strackrestore` that are going to be invoked
automatically as control-flow enters and leaves the body of the `alloca_scope`.
The key value of the new op is to allow deterministic guaranteed stack use
through an explicit annotation in the code which is finer-grain than the
function-level scope of `AutomaticAllocationScope` interface. `alloca_scope`
can be inserted at arbitrary locations and doesn’t require non-trivial
transformations such as outlining.
## Which dialect
Before memref dialect is split, `alloca_scope` can temporarily reside in `std`
dialect, and later on be moved to `memref` together with the rest of
memory-related operations.
## Implementation
An implementation of the op is available [here](https://reviews.llvm.org/D97768).
Original commits:
* Add initial scaffolding for alloca_scope op
* Add alloca_scope.return op
* Add no region arguments and variadic results
* Add op descriptions
* Add failing test case
* Add another failing test
* Initial implementation of lowering for std.alloca_scope
* Fix backticks
* Fix getSuccessorRegions implementation
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D97768
This is the first step to convert vector ops to MMA operations in order to
target GPUs tensor core ops. This currently only support simple cases,
transpose and element-wise operation will be added later.
Differential Revision: https://reviews.llvm.org/D102962
This allow creating a matrix with all elements set to a given value. This is
needed to be able to implement a simple dot op.
Differential Revision: https://reviews.llvm.org/D103870
tosa.matmul is a batched matmul, update the lowering for linalg
with the tests.
Reviewed By: sjarus
Differential Revision: https://reviews.llvm.org/D103937
This allows us to remove the `spv.mlir.endmodule` op and
all the code associated with it.
Along the way, tightened the APIs for `spv.module` a bit
by removing some aliases. Now we use `getRegion` to get
the only region, and `getBody` to get the region's only
block.
Reviewed By: mravishankar, hanchung
Differential Revision: https://reviews.llvm.org/D103265
Now that memref supports arbitrary element types, add support for memref of
memref and make sure it is properly converted to the LLVM dialect. The type
support itself avoids adding the interface to the memref type itself similarly
to other built-in types. This allows the shape, and therefore byte size, of the
memref descriptor to remain a lowering aspect that is easier to customize and
evolve as opposed to sanctifying it in the data layout specification for the
memref type itself.
Factor out the code previously in a testing pass to live in a dedicated data
layout analysis and use that analysis in the conversion to compute the
allocation size for memref of memref. Other conversions will be ported
separately.
Depends On D103827
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D103828
Historically, MemRef only supported a restricted list of element types that
were known to be storable in memory. This is unnecessarily restrictive given
the open nature of MLIR's type system. Allow types to opt into being used as
MemRef elements by implementing a type interface. For now, the interface is
merely a declaration with no methods. Later, methods to query, e.g., the type
size or whether a type can alias elements of another type may be added.
Harden the "standard"-to-LLVM conversion against memrefs with non-builtin
types.
See https://llvm.discourse.group/t/rfc-memref-of-custom-types/3558.
Depends On D103826
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D103827
Some places in the alloc-like op conversion use the converted index type
whereas other places use the pointer-sized integer type, which may not be the
same. Consistently use the converted index type, similarly to other address
calculations.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D103826
This patch convert the if condition on standalone data operation such as acc.update,
acc.enter_data and acc.exit_data to a scf.if with the operation in the if region.
It removes the operation when the if condition is constant and false. It removes the
the condition if it is contant and true.
Conversion to scf.if is done in order to use the translation to LLVM IR dialect out of the box.
Not sure this is the best approach or we should perform this during the translation from OpenACC
to LLVM IR dialect. Any thoughts welcome.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103325
Convert data operands from the acc.parallel operation using the same conversion pattern than D102170.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103337
Convert data operands from the acc.data operation using the same conversion pattern than D102170.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103332
Support for tensor types in the unrolled version will follow in a separate commit.
Add a new pass option to activate lowering of transfer ops with tensor types (default: deactivated).
Differential Revision: https://reviews.llvm.org/D102666
Depthwise convolution should support kernel dilation and non-dilation should
not be a special case. Updated op definition to include a dilation attribute.
This also adds a tosa.depthwise_conv2d lowering to linalg to support the new
linalg behavior.
Differential Revision: https://reviews.llvm.org/D103219
Depends On D103102
Not yet implemented:
1. Error handling after synchronous await
2. Error handling for async groups
Will be addressed in the followup PRs
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103109
In order to allow large matmul operations using the MMA ops we need to chain
operations this is not possible unless "DOp" and "COp" type have matching
layout so remove the "DOp" layout and force accumulator and result type to
match.
Added a test for the case where the MMA value is accumulated.
Differential Revision: https://reviews.llvm.org/D103023
The casting ops (sitofp, uitofp, fptosi, fptoui) lowering currently does
not handle n-D vectors. This patch fixes that.
Differential Revision: https://reviews.llvm.org/D103207
Indexed Generic should be going away in the future. Migrate to linalg.index.
Reviewed By: NatashaKnk, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D103110
This provides a sizable compile time improvement by seeding
the worklist in an order that leads to less iterations of the
worklist.
This patch only changes the behavior of the Canonicalize pass
itself, it does not affect other passes that use the
GreedyPatternRewrite driver
Differential Revision: https://reviews.llvm.org/D103053