Commit Graph

1318 Commits

Author SHA1 Message Date
gysit 9912bed730 [mlir][linalg] Remove RangeOp and RangeType.
Remove the RangeOp and the RangeType that are not actively used anymore. After removing RangeType, the LinalgTypes header only includes the generated dialect header.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115727
2021-12-15 07:19:10 +00:00
Lei Zhang 96130b5dc7 [mlir][spirv] Support size-1 vector/tensor constant during conversion
Reviewed By: ThomasRaoux, mravishankar

Differential Revision: https://reviews.llvm.org/D115518
2021-12-14 15:58:08 -05:00
Alexander Belyaev 15f8f3e20a [mlir] Split std.rank into tensor.rank and memref.rank.
Move `std.rank` similarly to how `std.dim` was moved to TensorOps and MemRefOps.

Differential Revision: https://reviews.llvm.org/D115665
2021-12-14 10:15:55 +01:00
River Riddle 233e9476d8 [mlir:PDL] Allow non-bound pdl.attribute/pdl.type operations that create constants
This allows for passing in these attributes/types to constraints/rewrites as arguments.

Differential Revision: https://reviews.llvm.org/D114817
2021-12-10 19:38:43 +00:00
Alexander Belyaev b618880e7b [mlir] Move `linalg.tensor_expand/collapse_shape` to TensorDialect.
RFC: https://llvm.discourse.group/t/rfc-reshape-ops-restructuring/3310

linalg.fill gets a canonicalizer, because `FoldFillWithTensorReshape` cannot be moved to tensorops (it uses linalg::FillOp inside). Before it was listed as a canonicalization pattern for the reshape operations, now it became a canonicalization for FillOp.

Differential Revision: https://reviews.llvm.org/D115502
2021-12-10 12:11:48 +01:00
Mehdi Amini 79a0330a52 Fix crash from use of a temporary after its scope exit
Introduced in D110448 and broke some bots (reported by ASAN).

Differential Revision: https://reviews.llvm.org/D110448
2021-12-10 05:04:23 +00:00
Krzysztof Drewniak e1da62910e [MLIR][GPU] Define gpu.printf op and its lowerings
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered

This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.

And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels

This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.

In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110448
2021-12-09 15:54:31 +00:00
Rob Suderman 23149d522b [mlir] Added ctlz and cttz to math dialect and LLVM dialect
Count leading/trailing zeros are an existing LLVM intrinsic. Added LLVM
support for the intrinsics with lowerings from the math dialect to LLVM
dialect.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D115206
2021-12-08 14:32:15 -08:00
Butygin d8fce785de [mlir][spirv] math.erf OpenCL lowering
Differential Revision: https://reviews.llvm.org/D115335
2021-12-08 21:59:46 +03:00
Mehdi Amini be0a7e9f27 Adjust "end namespace" comment in MLIR to match new agree'd coding style
See D115115 and this mailing list discussion:
https://lists.llvm.org/pipermail/llvm-dev/2021-December/154199.html

Differential Revision: https://reviews.llvm.org/D115309
2021-12-08 06:05:26 +00:00
Rob Suderman c5fef77bc3 [mlir] Add CtPop to MathOps with lowering to LLVM
math.ctpop maths to the llvm.ctpop intrinsic.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D114998
2021-12-06 11:54:20 -08:00
Alex Zinenko d64b3e47ba [mlir] Avoid needlessly converting LLVM named structs with compatible elements
Conversion of LLVM named structs leads to them being renamed since we cannot
modify the body of the struct type once it is set. Previously, this applied to
all named struct types, even if their element types were not affected by the
conversion. Make this behvaior only applicable when element types are changed.
This requires making the LLVM dialect type-compatibility check recursively look
at the element types (arguably, it should have been doing than since the moment
the LLVM dialect type system stopped being closed). In addition, have a more
lax check for outer types only to avoid repeated check when necessary (e.g.,
parser, verifiers that are going to also look at the inner type).

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D115037
2021-12-06 13:42:11 +01:00
Alex Zinenko 9dd1f8dfdd [mlir] support recursive type conversion of named LLVM structs
A previous commit added support for converting elemental types contained in
LLVM dialect types in case they were not compatible with the LLVM dialect. It
was missing support for named structs as they could be recursive, which was not
supported by the conversion infra. Now that it is, add support for converting
such named structs.

Depends On D113579

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D113580
2021-12-03 12:41:40 +01:00
Michal Terepeta 1423e8bf5d [mlir][Vector] Support 0-D vectors in `BitCastOp`
The implementation only allows to bit-cast between two 0-D vectors. We could
probably support casting from/to vectors like `vector<1xf32>`, but I wasn't
convinced that this would be important and it would require breaking the
invariant that `BitCastOp` works only on vectors with equal rank.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114854
2021-12-03 08:55:59 +00:00
Nicolas Vasilache c537a94334 [mlir][Vector] Thread 0-d vectors through vector.transfer ops
This revision adds 0-d vector support to vector.transfer ops.
In the process, numerous cleanups are applied, in particular around normalizing
and reducing the number of builders.

Reviewed By: ThomasRaoux, springerm

Differential Revision: https://reviews.llvm.org/D114803
2021-12-01 16:49:43 +00:00
Jacques Pienaar 62fea88bc5 [mlir] Update accessors prefixed form (NFC) 2021-11-30 19:42:37 -08:00
Stephen Neuendorffer 7386364889 Revert "[MLIR] Update Vector To LLVM conversion to be aware of assume_alignment"
This reverts commit 29a50c5864.

After LLVM lowering, the original patch incorrectly moved alignment
information across an unconstrained GEP operation.  This is only correct
for some index offsets in the GEP.  It seems that the best approach is,
in fact, to rely on LLVM to propagate information from the llvm.assume()
to users.

Thanks to Thomas Raoux for catching this.
2021-11-30 15:18:22 -08:00
Nicolas Vasilache a08b750ce9 [mlir][tensor] InsertSliceOp verification.
This revision reintroduces tensor.insert_slice verification which seems
to have vanished over time: a verifier was initially introduced in cf9503c1b7
but for some reason the invalid.mlir was not properly updated; as time passed the verifier was not called anymore and later the code was deleted.

As a consequence, a non-negligible portion of tests has run astray using invalid
tensor.insert_slice semantics and needed to be fixed.

Also, extract isRankReducedType from TensorOps for better reuse
Originally, this facility was used by both tensor and memref forms but
it got copied around as dialects were split.

Differential Revision: https://reviews.llvm.org/D114715
2021-11-30 20:37:06 +00:00
Alexander Belyaev f910aa9105 [mlir] Fix BufferizationToMemRef build. 2021-11-30 13:10:54 +01:00
Julian Gross ae1ea0bead [mlir] Decompose Bufferization Clone operation into Memref Alloc and Copy.
This patch introduces a new conversion to convert bufferization.clone operations
into a memref.alloc and a memref.copy operation. This transformation is needed to
transform all remaining clones which "survive" all previous transformations, before
a given program is lowered further (to LLVM e.g.). Otherwise, these operations
cannot be handled anymore and lead to compile errors.
See: https://llvm.discourse.group/t/bufferization-error-related-to-memref-clone/4665

Differential Revision: https://reviews.llvm.org/D114233
2021-11-30 10:15:56 +01:00
Stanislav Funiak a19e163526 Fixed broken build under GCC 5.4.
This diff fixes broken build caused by D108550. Under GCC 5, auto lambdas that capture this require `this->` for member calls.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D114659
2021-11-27 09:03:27 +05:30
Michal Terepeta d0f927121e [mlir][Standard] Support 0-D vectors in `SplatOp`
This changes the op to produce `AnyVectorOfAnyRank` and implements this by just
inserting the element (skipping the shuffle that we do for the 1-D case).

Depends On D114549

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114598
2021-11-26 17:05:15 +00:00
Benjamin Kramer 8521850f20 Provide a definition for OperationPosition::kDown
This isn't necessary in C++17, but C++14 still requires it.
2021-11-26 14:11:59 +01:00
Benjamin Kramer 1b0312d280 [PDL] fix unused variable warning in Release builds 2021-11-26 14:11:58 +01:00
Stanislav Funiak a76ee58f3c Multi-root PDL matching using upward traversals.
This is commit 4 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

This PR integrates the various components (root ordering algorithm, nondeterministic execution of PDL bytecode) to implement multi-root PDL matching. The main idea is for the pattern to specify mulitple candidate roots. The PDL-to-PDLInterp lowering selects one of these roots and "hangs" the pattern from this root, traversing the edges downwards (from operation to its operands) when possible and upwards (from values to its uses) when needed. The root is selected by invoking the optimal matching multiple times, once for each candidate root, and the connectors are determined form the optimal matching. The costs in the directed graph are equal to the number of upward edges that need to be traversed when connecting the given two candidate roots. It can be shown that, for this choice of the cost function, "hanging" the pattern an inner node is no better than from the optimal root.

The following three main additions were implemented as a part of this PR:
1. OperationPos predicate has been extended to allow tracing the operation accepting a value (the opposite of operation defining a value).
2. Predicate checking if two values are not equal - this is useful to ensure that we do not traverse the edge back downwards after we traversed it upwards.
3. Function for for building the cost graph among the candidate roots.
4. Updated buildPredicateList, building the predicates optimal branching has been determined.

Testing: unit tests (an integration test to follow once the stack of commits has landed)

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108550
2021-11-26 18:11:37 +05:30
Stanislav Funiak 6df7cc7f47 Implementation of the root ordering algorithm
This is commit 3 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

We form a graph over the specified roots, provided in `pdl.rewrite`, where two roots are connected by a directed edge if the target root can be connected (via a chain of operations) in the underlying pattern to the source root. We place a restriction that the path connecting the two candidate roots must only contain the nodes in the subgraphs underneath these two roots. The cost of an edge is the smallest number of upward traversals (edges) required to go from the source to the target root, and the connector is a `Value` in the intersection of the two subtrees rooted at the source and target root that results in that smallest number of such upward traversals. Optimal root ordering is then formulated as the problem of finding a spanning arborescence (i.e., a directed spanning tree) of minimal weight.

In order to determine the spanning arborescence (directed spanning tree) of minimum weight, we use the [Edmonds' algorithm](https://en.wikipedia.org/wiki/Edmonds%27_algorithm). The worst-case computational complexity of this algorithm is O(_N_^3) for a single root, where _N_ is the number of specified roots. The `pdl`-to-`pdl_interp` lowering calls this algorithm as a subroutine _N_ times (once for each candidate root), so the overall complexity of root ordering is O(_N_^4). If needed, this complexity could be reduced to O(_N_^3) with a more efficient algorithm. However, note that the underlying implementation is very efficient, and _N_ in our instances tends to be very small (<10). Therefore, we believe that the proposed (asymptotically suboptimal) implementation will suffice for now.

Testing: a unit test of the algorithm

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108549
2021-11-26 18:11:37 +05:30
Michal Terepeta cc311a155a [mlir][Vector] Support 0-D vectors in `VectorPrintOpConversion`
Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114549
2021-11-25 20:12:18 +00:00
Butygin 8dae0b6b6c [mlir][spirv] arith::RemSIOp OpenCL lowering
Differential Revision: https://reviews.llvm.org/D114524
2021-11-25 12:44:06 +03:00
Butygin 75a1bee05d [mlir][spirv] Add math to OpenCL conversion
Differential Revision: https://reviews.llvm.org/D113780
2021-11-24 02:31:21 +03:00
Rob Suderman 54eec7cafc [mlir][tosa] Separate tosa.transpose_conv decomposition and added stride support
Transpose convolution decomposition is now performed in a separate pass. This
allows padding / constant propagation to be performed at the TOSA level. It
also adds support for striding when there is no dilation.

Differential Revision: https://reviews.llvm.org/D114409
2021-11-23 12:16:44 -08:00
Nicolas Vasilache 3ff4e5f2a4 [mlir][Vector] Thread 0-d vectors through InsertElementOp.
This revision makes concrete use of 0-d vectors to extend the semantics of
InsertElementOp.

Reviewed By: dcaballe, pifon2a

Differential Revision: https://reviews.llvm.org/D114388
2021-11-23 12:55:11 +00:00
Nicolas Vasilache e7026aba00 [mlir][Vector] Thread 0-d vectors through ExtractElementOp.
This revision starts making concrete use of 0-d vectors to extend the semantics of
ExtractElementOp.
In the process a new VectorOfAnyRank Tablegen OpBase.td is added to allow progressive transition to supporting 0-d vectors by gradually opting in.

Differential Revision: https://reviews.llvm.org/D114387
2021-11-23 12:39:44 +00:00
Thomas Raoux 47555d73f6 [mlir][gpu] Extend shuffle op modes and add nvvm lowering
Add up, down and idx modes to gpu shuffle ops, also change the mode from
string to enum

Differential Revision: https://reviews.llvm.org/D114188
2021-11-19 11:14:31 -08:00
Mogball 7c5ecc8b7e [mlir][vector] Insert/extract element can accept index
`vector::InsertElementOp` and `vector::ExtractElementOp` have had their `position`
operand changed to accept `AnySignlessIntegerOrIndex` for better operability with
operations that use `index`, such as affine loops.

LLVM's `extractelement` and `insertelement` can also accept `i64`, so lowering
directly to these operations without explicitly inserting casts is allowed. SPIRV's
equivalent ops can also accept `i64`.

Reviewed By: nicolasvasilache, jpienaar

Differential Revision: https://reviews.llvm.org/D114139
2021-11-18 22:40:29 +00:00
River Riddle 0c7890c844 [mlir] Convert NamedAttribute to be a class
NamedAttribute is currently represented as an std::pair, but this
creates an extremely clunky .first/.second API. This commit
converts it to a class, with better accessors (getName/getValue)
and also opens the door for more convenient API in the future.

Differential Revision: https://reviews.llvm.org/D113956
2021-11-18 05:39:29 +00:00
River Riddle 195730a650 [mlir][NFC] Replace references to Identifier with StringAttr
This is part of the replacement of Identifier with StringAttr.

Differential Revision: https://reviews.llvm.org/D113953
2021-11-16 17:36:26 +00:00
Butygin 526b71e44a [mlir] spirv: Add scf.while spirv conversion
* It works similar to scf.for coversion, but convert condition and yield ops as part of scf.whille pattern so it don't need to maintain external state

Differential Revision: https://reviews.llvm.org/D113007
2021-11-16 13:19:34 +03:00
Adrian Kuegel 921d91f3ac [mlir] Support multi-dimensional vectors in MathToLibm conversion.
Differential Revision: https://reviews.llvm.org/D113969
2021-11-16 11:13:52 +01:00
natashaknk 381677dfbf [tosa][mlir] Refactor tosa.reshape lowering to linalg for dynamic cases.
Split tosa.reshape into three individual lowerings: collapse, expand and a
combination of both. Add simple dynamic shape support.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D113936
2021-11-15 15:31:37 -08:00
Nicolas Vasilache ee80ffbf9a [mlir][Linalg] Add bounded recursion declaration to FMAOp -> LLVM conversion.
FMAOp -> LLVM conversion is done progressively by peeling off 1 dimension from FMAOp at each pattern iteration. Add the recursively bounded property declaration to the pattern so that the rewriter can apply it multiple times.

Without this, FMAOps with 3+D do not lower to LLVM.

Differential Revision: https://reviews.llvm.org/D113886
2021-11-15 12:41:52 +00:00
Alexander Belyaev 9b1d90e8ac [mlir] Move min/max ops from Std to Arith.
Differential Revision: https://reviews.llvm.org/D113881
2021-11-15 13:19:17 +01:00
Nicolas Vasilache f67171ac58 [mlir][Linalg] Make depthwise convolution naming scheme consistent.
Names should be consistent across all operations otherwise painful bugs will surface.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D113762
2021-11-15 07:54:29 +00:00
Thomas Raoux e7969240dc [mlir][VectorToGPU] Support more cases in conversion to MMA ops
Support load with broadcast, elementwise divf op and remove the
hardcoded restriction on the vector size. Picking the right size should
be enfored by user and will fail conversion to llvm/spirv if it is not
supported.

Differential Revision: https://reviews.llvm.org/D113618
2021-11-11 13:10:38 -08:00
Rob Suderman 860d3811a9 [mlir][tosa] Add lowering for tosa.pad with explicit value
New TOSA pad operation can support explicitly specifying the pad value. Added
lowering to linalg that uses the explicit value.

Differential Revision: https://reviews.llvm.org/D113515
2021-11-10 14:15:20 -08:00
thomasraoux f309939d06 [mlir][nvvm] Remove special case ptr arithmetic lowering in gpu to nvvm
Use existing helper instead of handling only a subset of indices lowering
arithmetic. Also relax the restriction on the memref rank for the GPU mma ops
as we can now support any rank.

Differential Revision: https://reviews.llvm.org/D113383
2021-11-10 10:00:12 -08:00
Alex Zinenko e64c76672f [mlir] recursively convert builtin types to LLVM when possible
Given that LLVM dialect types may now optionally contain types from other
dialects, which itself is motivated by dialect interoperability and progressive
lowering, the conversion should no longer assume that the outermost LLVM
dialect type can be left as is. Instead, it should inspect the types it
contains and attempt to convert them to the LLVM dialect. Introduce this
capability for LLVM array, pointer and structure types. Only literal structures
are currently supported as handling identified structures requires the
converison infrastructure to have a mechanism for avoiding infite recursion in
case of recursive types.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D112550
2021-11-10 18:11:00 +01:00
River Riddle 937e40a8cf [mlir] Remove the non-templated DenseElementsAttr::getSplatValue
This predates the templated variant, and has been simply forwarding
to getSplatValue<Attribute> for some time. Removing this makes the
API a bit more uniform, and also helps prevent users from thinking
it is "cheap".
2021-11-09 01:40:40 +00:00
River Riddle ae40d62541 [mlir] Refactor ElementsAttr's value access API
There are several aspects of the API that either aren't easy to use, or are
deceptively easy to do the wrong thing. The main change of this commit
is to remove all of the `getValue<T>`/`getFlatValue<T>` from ElementsAttr
and instead provide operator[] methods on the ranges returned by
`getValues<T>`. This provides a much more convenient API for the value
ranges. It also removes the easy-to-be-inefficient nature of
getValue/getFlatValue, which under the hood would construct a new range for
the type `T`. Constructing a range is not necessarily cheap in all cases, and
could lead to very poor performance if used within a loop; i.e. if you were to
naively write something like:

```
DenseElementsAttr attr = ...;
for (int i = 0; i < size; ++i) {
  // We are internally rebuilding the APFloat value range on each iteration!!
  APFloat it = attr.getFlatValue<APFloat>(i);
}
```

Differential Revision: https://reviews.llvm.org/D113229
2021-11-09 00:15:08 +00:00
thomasraoux d88cc07943 [mlir][gpuTonvvm] Remove hardcoded values in MMAType to llvm struct
Also relax the types allowed in GPU wmma ops

Differential Revision: https://reviews.llvm.org/D112969
2021-11-02 08:12:27 -07:00
thomasraoux 7fbb0678fa [mlir][VectorToGPU] Add support for elementwise mma to vector to GPU
Differential Revision: https://reviews.llvm.org/D112960
2021-11-02 08:01:04 -07:00