This patch extends the GPU kernel outlining pass so that it can take in
an optional data layout specification that will be attached to the GPU
module operation generated. If the data layout specification is not provided
the default data layout is used instead.
Reviewed By: herhut, mehdi_amini
Differential Revision: https://reviews.llvm.org/D115722
Having a default value for the lowering strategy of the multi-reduction op has proven
to be unexpected by users. This patch is dropping the default value so that users have
to explicitly choose the lowering strategy to be applied.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115805
This allows op interface implementations to make decisions based on dialect-specific bufferization state.
This is in preparation of fixing conflict detection of CallOps in ModuleBufferization.
Differential Revision: https://reviews.llvm.org/D115705
The `rewrite` statement allows for rewriting a given root
operation with a block of nested rewriters. The root operation is
not implicitly erased or replaced, and any transformations to it
must be expressed within the nested rewrite block. The inner body
may contain any number of other rewrite statements, variables, or
expressions.
Differential Revision: https://reviews.llvm.org/D115299
This statement acts as a companion to the existing `erase`
statement, and is the corresponding PDLL construct for the
`PatternRewriter::replaceOp` C++ API. This statement replaces a
given operation with a set of values.
Differential Revision: https://reviews.llvm.org/D115298
Tuples are used to group multiple elements into a single
compound value. The values in a tuple can be of any type, and
do not need to be of the same type. There is also no limit to
the number of elements held by a tuple.
Tuples will be used to support multiple results from
Constraints and Rewrites (added in a followup), and will also
make it easier to support more complex primitives (such as
range based maps that can operate on multiple values).
Differential Revision: https://reviews.llvm.org/D115297
An operation expression in PDLL represents an MLIR operation. In
the match section of a pattern, this expression models one of
the input operations to the pattern. In the rewrite section of
a pattern, this expression models one of the operations to
create. The general structure of the operation expression is very
similar to that of the "generic form" of textual MLIR assembly:
```
let root = op<my_dialect.foo>(operands: ValueRange) {attr = attr: Attr} -> (resultTypes: TypeRange);
```
For now we only model the components that are within PDL, as PDL
gains support for blocks and regions so will this expression.
Differential Revision: https://reviews.llvm.org/D115296
This allows for using literal attributes and types within PDLL,
which simplifies building both constraints and rewriters. For
example, checking if an attribute is true is as simple as
`attr<"true">`.
Differential Revision: https://reviews.llvm.org/D115295
This allows for overriding the metadata of a pattern and
providing information such as the benefit, bounded recursion,
and more in the future.
Differential Revision: https://reviews.llvm.org/D115294
This is a new pattern rewrite frontend designed from the ground
up to support MLIR constructs, and to target PDL. This frontend
language was proposed in https://llvm.discourse.group/t/rfc-pdll-a-new-declarative-rewrite-frontend-for-mlir/4798
This commit starts sketching out the base structure of the
frontend, and is intended to be a minimal starting point for
building up the language. It essentially contains support for
defining a pattern, variables, and erasing an operation. The
features mentioned in the proposal RFC (including IDE support)
will be added incrementally in followup commits.
I intend to upstream the documentation for the language in a
followup when a bit more of the pieces have been landed.
Differential Revision: https://reviews.llvm.org/D115093
Previously, the LogicalResult return value of restoreRow was being ignored in
places where it was expected to always be success. Instead, check the result
and go to an `llvm_unreachable` if it turns out to be failure.
If all the dims are reduction dims, it is already in inner-most/outer-most
reduction form.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D115820
Implements the RegionBranchOpInterface method getNumRegionInvocations to `scf::IfOp` so that, when the condition is constant, the number of region executions can be analyzed by `NumberOfExecutions`.
Reviewed By: jpienaar, ftynse
Differential Revision: https://reviews.llvm.org/D115087
* Call `replaceOp` instead of `mapBuffer`.
* Remove bvm and all helper functions around bvm.
* Simplify FuncOp bufferization and rely on existing functionality to generate ToMemrefOps for function BlockArguments.
Differential Revision: https://reviews.llvm.org/D115515
Ops for the signed counterparts "llvm.smin" and "llvm.smax" already exist. This patch adds the unsigned versions as well.
Differential Revision: https://reviews.llvm.org/D115796
After removing the range type, Linalg does not define any type. The revision thus consolidates the LinalgOps.h and LinalgTypes.h into a single Linalg.h header. Additionally, LinalgTypes.cpp is renamed to LinalgDialect.cpp to follow the convention adopted by other dialects such as the tensor dialect.
Depends On D115727
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115728
This patch adds lowering from omp.sections and omp.section (simple lowering along with the nowait clause) to LLVM IR.
Tests for the same are also added.
Reviewed By: ftynse, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D115030
Instead of modifying the existing linalg.tiled_loop op, create a new op with memref input/outputs and delete the old op.
Differential Revision: https://reviews.llvm.org/D115493
Instead of modifying the existing scf.if op, create a new op with memref OpOperands/OpResults and delete the old op.
New allocations / other memrefs can now be yielded from the op. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.
Differential Revision: https://reviews.llvm.org/D115491
With VectorType supporting scalable dimensions, we don't need many of
the operations currently present in ArmSVE, like mask generation and
basic arithmetic instructions. Therefore, this patch also gets
rid of those.
Having built-in scalable vector support also simplifies the lowering of
scalable vector dialects down to LLVMIR.
Scalable dimensions are indicated with the scalable dimensions
between square brackets:
vector<[4]xf32>
Is a scalable vector of 4 single precission floating point elements.
More generally, a VectorType can have a set of fixed-length dimensions
followed by a set of scalable dimensions:
vector<2x[4x4]xf32>
Is a vector with 2 scalable 4x4 vectors of single precission floating
point elements.
The scale of the scalable dimensions can be obtained with the Vector
operation:
%vs = vector.vscale
This change is being discussed in the discourse RFC:
https://llvm.discourse.group/t/rfc-add-built-in-support-for-scalable-vector-types/4484
Differential Revision: https://reviews.llvm.org/D111819
Instead of modifying the existing scf.for op, create a new op with memref OpOperands/OpResults and delete the old op.
New allocations / other memrefs can now be yielded from the loop. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.
This change also introduces `replaceOp`, which will be utilized by all other `bufferize` implementations in future commits. Bufferization will then no longer rely on old (pre-bufferize) ops to DCE away. Instead old ops are deleted on the spot. This improves debuggability because there won't be any duplicate ops anymore (bufferized + not-yet-bufferized) when dumping IR during bufferization. It is also less fragile because unbufferized IR can no longer silently "hang around" due to an implementation bug.
Differential Revision: https://reviews.llvm.org/D114926
Added documentation to clearify the purpose of the bufferization to memref pass
and added some remarks.
Differential Revision: https://reviews.llvm.org/D115326
Remove the RangeOp and the RangeType that are not actively used anymore. After removing RangeType, the LinalgTypes header only includes the generated dialect header.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115727
Break up the vectorization pre-condition into the part checking for
static shape and the rest checking if the linalg op is supported by
vectorization. This allows checking if an op could be vectorized if it
had static shapes.
Differential Revision: https://reviews.llvm.org/D115754
While the default value for the amdgpu-flat-work-group-size attribute,
"1, 256", matches the defaults from Clang, some users of the ROCDL dialect,
namely Tensorflow, use larger workgroups, such as 1024. Therefore,
instead of hardcoding this value, we add a rocdl.max_flat_work_group_size
attribute that can be set on GPU kernels to override the default value.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D115741
data point using the 3-dim tensor nell-2.tns
MLIR:
READ FILE INTO COO: 24424.369294 ms ---> improves to ----> 9638.501044 ms
SORT COO BEFORE PACK: 762.834831 ms
PACK COO TO TENSOR: 1243.376245 ms
TACO:
b file read: 13270.9 ms
b pack: 7137.74 ms
b size: (12092 x 9184 x 28818), 925300328 bytes
https://github.com/llvm/llvm-project/issues/52679
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115696
Make the reduction handling in OpenMPIRBuilder compatible with
opaque pointers by explicitly storing the element type in ReductionInfo,
and also passing it to the atomic reduction callback, as at least
the ones in the test need the type there.
This doesn't make things fully compatible yet, there are other
uses of element types in this class. I also left one
getPointerElementType() call in mlir, because I'm not familiar
with that area.
Differential Revison: https://reviews.llvm.org/D115638
Instead of printing analysis debug information to stderr, annotate the IR. This makes it easier to understand decisions made by the analysis, especially in larger input IR.
Differential Revision: https://reviews.llvm.org/D115575
Implementation of the interface allows querying the size and alignments of an LLVMArrayType as well as query the size and alignment of a struct containing an LLVMArrayType.
The implementation should yield the same results as llvm::DataLayout, including support for over aligned element types.
There is no customization point for adjusting an arrays alignment; it is simply taken from the element type.
Differential Revision: https://reviews.llvm.org/D115704
This is the second part of https://reviews.llvm.org/D114993 after slicing
into 2 independent commits.
This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.
For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.
The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.
The new patterns here are only enabled for now by
-test-vector-transfer-flatten-patterns.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114993
This is the first part of https://reviews.llvm.org/D114993 which has been
split into small independent commits.
This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.
For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.
The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.
The new patterns here are only enabled for now by
-test-vector-transfer-flatten-patterns.
Reviewed By: nicolasvasilache
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
* Since these passes are now generic, they do not automatically nest with pass managers set up for implicit nesting.
* The Detensorize pass must run on a FunctionLike, and this requires explicit nesting.
* Addressed missed comments from the original and per-suggestion removed the assert on FunctionLike in ElementwiseToLinalg and DropUnitDims.cpp, which also is what was causing the integration test to fail.
This reverts commit aa8815e42e.
Differential Revision: https://reviews.llvm.org/D115671
explores various sparsity combinations of
the SDMM kernel and verifies that the computed
result is the same for all cases
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115476
Add convertFromMLIRSparseTensor to the supporting C shared library to convert
SparseTensorStorage to COO-flavor format.
Add Python routine sparse_tensor_to_coo_tensor to convert sparse tensor storage
pointer to numpy values for COO-flavor format tensor.
Add a Python test for sparse tensor output.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115557
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
* Since these passes are now generic, they do not automatically nest with pass managers set up for that.
* If running them over nested functions, you must nest explicitly. Upstream has adopted this style but *-opt still has some uses of implicit pipelines via args. See tests for argument changes needed.
Differential Revision: https://reviews.llvm.org/D115645
Adapt the LinalgStrategyVectorizationPattern pass to apply the vectorization patterns in two stages. The change ensures the generic pad tensor op vectorization pattern does not run too early. Additionally, the revision adds the transfer op canonicalization patterns to the set of applied patterns, since they are needed to enable efficient vectorization for rank-reduced convolutions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115627
This gives us better debugging print as it supports indent
levels and other nice features.
Reviewed By: Hardcode84
Differential Revision: https://reviews.llvm.org/D115583
The previous "optimization" that tries to reuse existing block for
selection header block can be problematic for deserialization
because it effectively pulls in previous ops in the selection op's
enclosing block into the selection op's header. When deserializing,
those ops will be placed in the selection op's region. If any of
the previous ops has usage after the section op, it will break. That
is, the following IR cannot round trip:
```mlir
^bb:
%def = ...
spv.mlir.selection { ... }
%use = spv.SomeOp %def
```
This commit removes the "optimization" to always create new blocks
for the selection header.
Along the way, also made error reporting better in deserialization
by turning asserts into proper errors and add check of uses outside
of sinked structured control flow region blocks.
Reviewed By: Hardcode84
Differential Revision: https://reviews.llvm.org/D115582