Commit Graph

866 Commits

Author SHA1 Message Date
Thomas Raoux a57ccad5a6 [VectorToGPU] Fix horizontal stride calculation for N-D memref
Fix a bug in how we calculate the stride of mma load/store ops for N-D
memrefs

Differential Revision: https://reviews.llvm.org/D118378
2022-01-27 13:35:56 -08:00
natashaknk 024a1fab5c [tosa][mlir] Add dynamic shape support for remaining ops
Added support for concat, tile, pad, argmax and table ops

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D118397
2022-01-27 11:25:38 -08:00
Benjamin Kramer 608cc6b163 [mlir][complex] Lower complex.constant to LLVM
This fixes a regression from 480cd4cb85

Differential Revision: https://reviews.llvm.org/D118347
2022-01-27 13:48:23 +01:00
River Riddle 632a4f8829 [mlir] Move std.generic_atomic_rmw to the memref dialect
This is part of splitting up the standard dialect. The move makes sense anyways,
given that the memref dialect already holds memref.atomic_rmw which is the non-region
sibling operation of std.generic_atomic_rmw (the relationship is even more clear given
they have nearly the same description % how they represent the inner computation).

Differential Revision: https://reviews.llvm.org/D118209
2022-01-26 11:52:01 -08:00
Chuanqi Xu dbbe010908 [MLIR] [AsyncToLLVM] Use llvm.coro.align intrinsic
Use llvm.coro.align to align coroutine frame properly.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D117978
2022-01-25 19:28:25 +08:00
harsh e01e4c9115 Fix bugs in GPUToNVVM lowering
The current lowering from GPU to NVVM does
not correctly handle the following cases when
lowering the gpu shuffle op.

1. When the active width is set to 32 (all lanes),
then the current approach computes (1 << 32) -1 which
results in poison values in the LLVM IR. We fix this by
defining the active mask as (-1) >> (32 - width).

2. In the case of shuffle up, the computation of the third
operand c has to be different from the other 3 modes due to
the op definition in the ISA reference.
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html)
Specifically, the predicate value is computed as j >= maxLane
for up and j <= maxLane for all other modes. We fix this by
computing maskAndClamp as 32 - width for this mode.

TEST: We modify the existing test and add more checks for the up mode.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D118086
2022-01-25 03:24:14 +00:00
Rob Suderman 3e746c6d9e [mlir] Add support for ExpM1 to GLSL/OpenCL SPIRV Backends
Adding a similar decomposition for exponential minus one to the SPIRV
backends along with the necessary tests.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D118081
2022-01-24 15:38:34 -08:00
Alexander Belyaev fd0c6f5391 [mlir] Move linalg::PadTensorOp to tensor::PadOp.
RFC: https://llvm.discourse.group/t/rfc-move-linalg-padtensorop-to-tensor-padop/5785

Differential Revision: https://reviews.llvm.org/D117892
2022-01-21 20:02:39 +01:00
Lei Zhang 4710750854 [mlir][spirv] Support size-1 vector inserts during conversion
Differential Revision: https://reviews.llvm.org/D115517
2022-01-21 13:56:26 -05:00
Mogball e99835ffed [mlir][pdl] Make `pdl` the default dialect when parsing/printing
PDLDialect being a somewhat user-facing dialect and whose ops contain exclusively other PDL ops in their regions can take advantage of `OpAsmOpInterface` to provide nicer IR.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D117828
2022-01-20 20:22:53 +00:00
Mogball 7c471b56f2 [mlir][pdl] OperationOp should not be side-effect free
Unbound OperationOp in the matcher (i.e. one with no uses) is already disallowed by the verifier. However, an OperationOp in the rewriter is not side-effect free -- it's creating an op!

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D117825
2022-01-20 20:22:01 +00:00
River Riddle d75c3e8396 [mlir] Don't print `// no predecessors` on entry blocks
Entry blocks can never have predecessors, so this is unnecessary.

Fixes #53287

Differential Revision: https://reviews.llvm.org/D117713
2022-01-19 15:57:58 -08:00
natashaknk b9b10c0e61 [tosa][mlir] Lowering for dynamic shapes in the reduce_x ops in tosa-to-linalg
Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D117691
2022-01-19 11:15:14 -08:00
Thomas Raoux d9edc1a585 [mlir][spirv] Add math.fma lowering to spirv
Differential Revision: https://reviews.llvm.org/D117704
2022-01-19 10:57:05 -08:00
Mogball aae5125550 [mlir] Replace StrEnumAttr -> EnumAttr in core dialects
Removes uses of `StrEnumAttr` in core dialects

Reviewed By: mehdi_amini, rriddle

Differential Revision: https://reviews.llvm.org/D117514
2022-01-18 17:15:00 +00:00
Mogball 5c36ee8d57 [mlir] Drop the leading space when printing regions
The leading space that is always printed at the beginning of regions is not consistent with other parts of the printing API. Moreover, this leading space can lead to undesirable assembly formats:

```
attr-dict-with-keyword $region
```

Prints as:

```
// Two spaces between `}` and `{`
attributes {foo}  { ... }
```

Moreover, the leading space results in the odd generic op format:

```
"test.op"() ( {...}) : () -> ()
```

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D117411
2022-01-18 16:52:34 +00:00
Nicolas Vasilache cc0d208805 [mlir][Linalg] Drop deprecated convolution vectorization patterns
Differential revision: https://reviews.llvm.org/D117326
2022-01-18 09:26:50 +00:00
Benjamin Kramer 964dc368e7 [AsyncToLLVM] aligned_alloc requires the size to be a multiple of aignment, so round up
Fixes a crash with debug malloc.
2022-01-17 21:48:00 +01:00
Nicolas Vasilache f40a579bea Revert "[mlir][Linalg] NFC - Drop vectorization reliance on ConvolutionOpInterface"
This reverts commit c8f5735301.

The integration tests are broken.
2022-01-17 19:38:07 +00:00
Benjamin Kramer 5acd6e0522 [AsyncToLLVM] Align frames to 64 bytes
Coroutine lowering always takes the natural alignment when spilling to
the frame (issue #53148) so using AVX2 or AVX512 in a coroutine doesn't
work. Always overalign to 64 bytes to avoid this issue until we have a
better solution.

Differential Revision: https://reviews.llvm.org/D117501
2022-01-17 18:51:42 +01:00
Nicolas Vasilache c8f5735301 [mlir][Linalg] NFC - Drop vectorization reliance on ConvolutionOpInterface
Differential Revision: https://reviews.llvm.org/D117323
2022-01-17 17:01:36 +00:00
Benoit Jacob 499703e9c0 Enable ReassociatingReshapeOpConversion with "non-identity" layouts.
Enable ReassociatingReshapeOpConversion with "non-identity" layouts.

This removes an early-return in this function, which seems unnecessary and is
preventing some memref.collapse_shape from converting to LLVM (see included lit test).

It seems unnecessary because the return message says "only empty layout map is supported"
but there actually is code in this function to deal with non-empty layout maps. Maybe
it refers to an earlier state of implementation and is just out of date?

Though, there is another concern about this early return: the condition that it actually
checks, `{src,dst}MemrefType.getLayout().isIdentity()`, is not quite the same as what the
return message says, "only empty layout map is supported". Stepping through this
`getLayout().isIdentity()` code in GDB, I found that it evaluates to `.getAffineMap().isIdentity()`
which does (AffineMap.cpp:271):

```
  if (getNumDims() != getNumResults())
    return false;
```

This seems that it would always return false for memrefs of rank greater than 1 ?

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114808
2022-01-13 17:46:20 +00:00
natashaknk 310e9636ca [tosa][mlir] Support dynamic batch dimension for ops where the batch dim is explicit
Dynamic batch for rescale, gather, max_pool, avg_pool, conv2D and depthwise_conv2D. Split helper functions into a separate header file.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D117031
2022-01-12 14:16:50 -08:00
Rob Suderman aa1c533a4e [mlir][tosa] Expand tosa.apply_scale lowering for vectors
Apply scale may encounter scalar, tensor, or vector operations. Expand the
lowering so that it can lower arbitrary of container types.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D117080
2022-01-12 14:07:52 -08:00
Alex Zinenko f50cfc44d6 [mlir] Require struct indices in LLVM::GEPOp to be constant
Recent commits added a possibility for indices in LLVM dialect GEP operations
to be supplied directly as constant attributes to ensure they remain such until
translation to LLVM IR happens. Make this required for indexing into LLVM
struct types to match LLVM IR requirements, otherwise the translation would
assert on constructing such IR.

For better compatibility with MLIR-style operation construction interface,
allow GEP operations to be constructed programmatically using Values pointing
to known constant operations as struct indices.

Depends On D116758

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D116759
2022-01-07 09:56:05 +01:00
Stanislav Funiak 2692eae574 [MLIR][PDL] Refactor the positions for multi-root patterns.
When the original version of multi-root patterns was reviewed, several improvements were made to the pdl_interp operations during the review process. Specifically, the "get users of a value at the specified operand index" was split up into "get users" and "compare the users' operands with that value". The iterative execution was also cleaned up to `pdl_interp.foreach`. However, the positions in the pdl-to-pdl_interp lowering were not similarly refactored. This introduced several problems, including hard-to-detect bugs in the lowering and duplicate evaluation of `pdl_interp.get_users`.

This diff cleans up the positions. The "upward" `OperationPosition` was split-out into `UsersPosition` and `ForEachPosition`, and the operand comparison was replaced with a simple predicate. In the process, I fixed three bugs:
1. When multiple roots were had the same connector (i.e., a node that they shared with a subtree at the previously visited root), we would generate a single foreach loop rather than one foreach loop for each such root. The reason for this is that such connectors shared the position. The solution for this is to add root index as an id to the newly introduced `ForEachPosition`.
2. Previously, we would use `pdl_interp.get_operands` indiscriminately, whether or not the operand was variadic. We now correctly detect variadic operands and insert `pdl_interp.get_operand` when needed.
3. In certain corner cases, we would trigger the "connector has not been traversed yet" assertion. This was caused by not inserting the values during the upward traversal correctly. This has now been fixed.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D116080
2022-01-04 08:03:44 +05:30
William S. Moses a6a583dae4 [MLIR] Move AtomicRMW into MemRef dialect and enum into Arith
Per the discussion in https://reviews.llvm.org/D116345 it makes sense
to move AtomicRMWOp out of the standard dialect. This was accentuated by the
need to add a fold op with a memref::cast. The only dialect
that would permit this is the memref dialect (keeping it in the standard dialect
or moving it to the arithmetic dialect would require those dialects to have a
dependency on the memref dialect, which breaks linking).

As the AtomicRMWKind enum is used throughout, this has been moved to Arith.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D116392
2021-12-30 14:31:33 -05:00
MaheshRavishankar 7df7586a0b [mlir][MemRef] Deprecate unspecified trailing offset, size, and strides semantics of `OffsetSizeAndStrideOpInterface`.
The semantics of the ops that implement the
`OffsetSizeAndStrideOpInterface` is that if the number of offsets,
sizes or strides are less than the rank of the source, then some
default values are filled along the trailing dimensions (0 for offset,
source dimension of sizes, and 1 for strides). This is confusing,
especially with rank-reducing semantics. Immediate issue here is that
the methods of `OffsetSizeAndStridesOpInterface` assumes that the
number of values is same as the source rank. This cause out-of-bounds
errors.

So simplifying the specification of `OffsetSizeAndStridesOpInterface`
to make it invalid to specify number of offsets/sizes/strides not
equal to the source rank.

Differential Revision: https://reviews.llvm.org/D115677
2021-12-29 11:18:29 -08:00
William S. Moses 99fc000c87 [MLIR] Expose atomicrmw and/or
LLVM (dialect and IR) have atomics for and/or. This patch enables atomic_rmw ops in the standard dialect for and/or that lower to these (in addition to the existing atomics such as addi, etc).

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D116345
2021-12-29 00:23:28 -05:00
Rob Suderman f0cb77d7d5 [mlir][tosa] Resubmit split tosa-to-linalg named ops out of pass
Includes dependency fix that resulted in canonicalizer pass not linking in.

Linalg named ops lowering are moved to a separate pass. This allows TOSA
canonicalizers to run between named-ops lowerings and the general TOSA
lowerings. This allows the TOSA canonicalizers to run between lowerings.

Differential Revision: https://reviews.llvm.org/D116057
2021-12-28 11:22:58 -08:00
Mehdi Amini 735fe1da6b Revert "[mlir][tosa] Split tosa-to-linalg named ops out of pass"
This reverts commit 313de31fbb.

There is a missing CMake dependency, building with shared libraries is
broken:

55.509 [45/4/3061] Linking CXX shared library lib/libMLIRTosaToLinalg.so.14git
FAILED: lib/libMLIRTosaToLinalg.so.14git
...
TosaToLinalgPass.cpp: undefined reference to `mlir::createCanonicalizerPass()'
2021-12-24 00:09:15 +00:00
Rob Suderman 313de31fbb [mlir][tosa] Split tosa-to-linalg named ops out of pass
Linalg named ops lowering are moved to a separate pass. This allows TOSA
canonicalizers to run between named-ops lowerings and the general TOSA
lowerings. This allows the TOSA canonicalizers to run between lowerings.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D116057
2021-12-23 12:23:19 -08:00
Rob Suderman 0763f12213 [mlir][tosa] Handle rescale case where shift > 63
It is possible for the shift value to exceed the number of bits. In these
cases we can just multiply by zero. This is relatively rare occurence but
should be handled.

Reviewed By: not-jenni

Differential Revision: https://reviews.llvm.org/D115779
2021-12-16 15:30:48 -08:00
Lei Zhang 223be5f630 [mlir][spirv] Perform partial conversion in VectorToSPIRVPass
This allows the pass to participate in progressive lowering
and it also allows us to write tests better.

Along the way, cleaned up the tests.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D115756
2021-12-16 09:35:56 -05:00
Lei Zhang 96130b5dc7 [mlir][spirv] Support size-1 vector/tensor constant during conversion
Reviewed By: ThomasRaoux, mravishankar

Differential Revision: https://reviews.llvm.org/D115518
2021-12-14 15:58:08 -05:00
Alexander Belyaev 15f8f3e20a [mlir] Split std.rank into tensor.rank and memref.rank.
Move `std.rank` similarly to how `std.dim` was moved to TensorOps and MemRefOps.

Differential Revision: https://reviews.llvm.org/D115665
2021-12-14 10:15:55 +01:00
Jacques Pienaar efb7727a96 [mlir] Flag near misses in file splitting
Flags some potential cases where splitting isn't happening and so could result
in confusing results. Also update some test files where there were near misses
in splitting that seemed unintentional.

Differential Revision: https://reviews.llvm.org/D109636
2021-12-12 08:03:30 -08:00
Nicolas Vasilache 408553dd96 [mlir][Vector] Support 0-D vectors in `CreateMaskOp`
The 0-D case gets lowered in almost the same way that the 1-D case does
in VectorCreateMaskOpConversion. I also had to slightly update the
verifier for the op to always require exactly 1 operand in the 0-D case.

Depends On D115220

Reviewed by: ftynse

Differential revision: https://reviews.llvm.org/D115221
2021-12-12 13:32:29 +00:00
Michal Terepeta a0c930d312 [mlir][Vector] Support 0-D vectors in `CmpIOp`
Following the example of `VectorOfAnyRankOf`, I've done a few changes in the
`.td` files to help with adding the support for the 0-D case gradually.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D115220
2021-12-12 13:28:26 +00:00
River Riddle 233e9476d8 [mlir:PDL] Allow non-bound pdl.attribute/pdl.type operations that create constants
This allows for passing in these attributes/types to constraints/rewrites as arguments.

Differential Revision: https://reviews.llvm.org/D114817
2021-12-10 19:38:43 +00:00
Alexander Belyaev b618880e7b [mlir] Move `linalg.tensor_expand/collapse_shape` to TensorDialect.
RFC: https://llvm.discourse.group/t/rfc-reshape-ops-restructuring/3310

linalg.fill gets a canonicalizer, because `FoldFillWithTensorReshape` cannot be moved to tensorops (it uses linalg::FillOp inside). Before it was listed as a canonicalization pattern for the reshape operations, now it became a canonicalization for FillOp.

Differential Revision: https://reviews.llvm.org/D115502
2021-12-10 12:11:48 +01:00
Krzysztof Drewniak e1da62910e [MLIR][GPU] Define gpu.printf op and its lowerings
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered

This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.

And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels

This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.

In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110448
2021-12-09 15:54:31 +00:00
Rob Suderman 23149d522b [mlir] Added ctlz and cttz to math dialect and LLVM dialect
Count leading/trailing zeros are an existing LLVM intrinsic. Added LLVM
support for the intrinsics with lowerings from the math dialect to LLVM
dialect.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D115206
2021-12-08 14:32:15 -08:00
Butygin d8fce785de [mlir][spirv] math.erf OpenCL lowering
Differential Revision: https://reviews.llvm.org/D115335
2021-12-08 21:59:46 +03:00
Rob Suderman c5fef77bc3 [mlir] Add CtPop to MathOps with lowering to LLVM
math.ctpop maths to the llvm.ctpop intrinsic.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D114998
2021-12-06 11:54:20 -08:00
Alex Zinenko d64b3e47ba [mlir] Avoid needlessly converting LLVM named structs with compatible elements
Conversion of LLVM named structs leads to them being renamed since we cannot
modify the body of the struct type once it is set. Previously, this applied to
all named struct types, even if their element types were not affected by the
conversion. Make this behvaior only applicable when element types are changed.
This requires making the LLVM dialect type-compatibility check recursively look
at the element types (arguably, it should have been doing than since the moment
the LLVM dialect type system stopped being closed). In addition, have a more
lax check for outer types only to avoid repeated check when necessary (e.g.,
parser, verifiers that are going to also look at the inner type).

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D115037
2021-12-06 13:42:11 +01:00
Michal Terepeta caf89c0db6 [mlir][Vector] Support 0-D vectors in `ConstantMaskOp`
To support creating both a mask with just a single `true` and `false` values,
I had to relax the restriction in the verifier that the rank is always equal to
the length of the attribute array, in other words, we now allow:

- `vector.constant_mask [0] : vector<i1>` which gets lowered to
  `arith.constant dense<false> : vector<i1>`
- `vector.constant_mask [1] : vector<i1>` which gets lowered to
  `arith.constant dense<true> : vector<i1>`

(the attribute list for the 0-D case must be a singleton containing
either `0` or `1`)

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115023
2021-12-06 08:03:04 +00:00
Alex Zinenko 9dd1f8dfdd [mlir] support recursive type conversion of named LLVM structs
A previous commit added support for converting elemental types contained in
LLVM dialect types in case they were not compatible with the LLVM dialect. It
was missing support for named structs as they could be recursive, which was not
supported by the conversion infra. Now that it is, add support for converting
such named structs.

Depends On D113579

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D113580
2021-12-03 12:41:40 +01:00
Michal Terepeta 1423e8bf5d [mlir][Vector] Support 0-D vectors in `BitCastOp`
The implementation only allows to bit-cast between two 0-D vectors. We could
probably support casting from/to vectors like `vector<1xf32>`, but I wasn't
convinced that this would be important and it would require breaking the
invariant that `BitCastOp` works only on vectors with equal rank.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114854
2021-12-03 08:55:59 +00:00
Nicolas Vasilache c537a94334 [mlir][Vector] Thread 0-d vectors through vector.transfer ops
This revision adds 0-d vector support to vector.transfer ops.
In the process, numerous cleanups are applied, in particular around normalizing
and reducing the number of builders.

Reviewed By: ThomasRaoux, springerm

Differential Revision: https://reviews.llvm.org/D114803
2021-12-01 16:49:43 +00:00