Commit Graph

2128 Commits

Author SHA1 Message Date
William S. Moses 1773dddadf [MLIR][Math] Enable constant folding of ops
Enable constant folding of ops within the math dialect, and introduce constant folders for ceil and log2

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D117085
2022-01-12 12:19:29 -05:00
Matthias Springer 57e714bcc8 [mlir][linalg][bufferize] Add pass options for `createDeallocs`
This change makes it possible to use a different buffer deallocation strategy. E.g., `-buffer-deallocation` can be used, which also works for allocations that are not in destination-passing style.

Differential Revision: https://reviews.llvm.org/D117096
2022-01-12 18:55:36 +09:00
Matthias Springer 6c654b5198 [mlir][linalg][bufferize] Support std.select bufferization
This op is an example for how to deal with ops who's OpResult may aliasing with one of multiple OpOperands.

Differential Revision: https://reviews.llvm.org/D116868
2022-01-12 17:46:44 +09:00
William S. Moses d2c547342c Revert "[MLIR][Math] Enable constant folding of ops"
This reverts commit 2f8b956ab6.

There is a linker error for mlir-nvidia as seen on
https://lab.llvm.org/buildbot/#/builders/61/builds/19939.

As it's late for me here, I'm oing to rever this for now to be
investigated later.
2022-01-12 02:00:53 -05:00
William S. Moses 2f8b956ab6 [MLIR][Math] Enable constant folding of ops
Enable constant folding of ops within the math dialect, and introduce constant folders for ceil and log2

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D117085
2022-01-12 01:55:48 -05:00
William S. Moses aaa0c81683 [MLIR][LLVM] Add memoryeffect for alloca
Add memory effect for llvm.alloca op

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D117086
2022-01-12 01:53:24 -05:00
William S. Moses d23fa4f2f1 [MLIR][SCF] Remove unused arguments to whileop
Canonicalize away unused arguments to the before region of a whileOp

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D117059
2022-01-11 20:18:08 -05:00
William S. Moses 97567bde5b [MLIR][SCF] Canonicalize while statement whose cmp condition is recomputed in the after region
Given a while loop whose condition is given by a cmp, don't recomputed the comparison (or its inverse) in the after region, instead use a constant since  the original condition must be true if we branched to the after region.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D117047
2022-01-11 18:34:04 -05:00
MaheshRavishankar e7cb716ef9 [mlir][Linalg] Pattern to fuse pad operation with elementwise operations.
Most convolution operations need explicit padding of the input to
ensure all accesses are inbounds. In such cases, having a pad
operation can be a significant overhead. One way to reduce that
overhead is to try to fuse the pad operation with the producer of its
source.

A sequence

```
linalg.generic -> linalg.pad_tensor
```

can be replaced with

```
linalg.fill -> tensor.extract_slice -> linalg.generic ->
tensor.insert_slice.
```

if the `linalg.generic` has all parallel iterator types.

Differential Revision: https://reviews.llvm.org/D116418
2022-01-11 13:37:25 -08:00
William S. Moses 65c15cbd4a [MLIR][LLVM] Add MemRead/MemWrite behavior to llvm store/load/addressof ops
This patch adds corresponding memory effects to mlir llvm-dialect load/store/addressof ops, which thus enables canonicalizations of those ops (like dead code elimination) that rely on the effect interface

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D117041
2022-01-11 14:55:30 -05:00
Aaron DeBattista dfd070820c [mlir][tosa] Allow optional TOSA decompositions to be populated separately
Moved all TOSA decomposition patterns so that they can be optionally populated
and used by external rewrites. This avoids decomposing TOSa operations when
backends may benefit from the non-decomposed version.

Reviewed By: rsuderman, mehdi_amini

Differential Revision: https://reviews.llvm.org/D116526
2022-01-11 10:26:30 -08:00
William S. Moses 5443d2ed98 [MLIR][SCF] Simplify scf.if by swapping regions if condition is a not
Given an if of the form, simplify it by eliminating the not and swapping the regions

scf.if not(c) {
  yield origTrue
} else {
  yield origFalse
}

becomes

scf.if c {
  yield origFalse
} else {
  yield origTrue
}

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D116990
2022-01-11 12:57:29 -05:00
Christian Sigg d345ce65ff Mark arith.minf, arith.maxf as commutative.
Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D117010
2022-01-11 17:47:53 +01:00
Matthias Springer 2c5c5ca868 [mlir][linalg][bufferize] Fix CallOp bufferization
Previously, CallOps did not have any aliasing OpResult/OpOperand pairs. Therefore, CallOps were mostly ignored by the analysis and buffer copies were not inserted when necessary.

This commit introduces the following changes:
* Function bbArgs writable by default. A function can now be bufferized without inspecting its callers.
* Callers must introduce buffer copies of function arguments when necessary. If a function is external, the caller must conservatively assume that a function argument is modified by the callee after bufferization. If the function is not external, the caller inspects the callee to determine if a function argument is modified.

Differential Revision: https://reviews.llvm.org/D116457
2022-01-11 20:10:21 +09:00
Diego Caballero e2b658cd5d [mlir][GPU] Fix attribute name of DL specification
D115722 added a DL spec to GPU modules. It happens that the DL
default interface implementation is sensitive to the name of the
DL spec attribute. This patch is fixing the name of the attribute
to be the expected one.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D116956
2022-01-11 08:30:52 +00:00
William S. Moses a02af37560 [MLIR] Generalize select to arithmetic canonicalization
Given a select whose result is an i1, we can eliminate the conditional in the select completely by adding a few arithmetic operations.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D116839
2022-01-10 11:50:17 -05:00
Nicolas Vasilache d0ee094d6a [mlir][Bufferize] Fix incorrect bufferization of rank-reducing tensor ops.
This revision fixes SubviewOp, InsertSliceOp, ExtractSliceOp construction during bufferization
where not all offset/size/stride operands were properly specified.

A test that exhibited problematic behaviors related to incorrect memref casts is introduced.
Init tensor optimization is disabled in teh testing func bufferize pass.

Differential Revision: https://reviews.llvm.org/D116899
2022-01-10 10:14:55 -05:00
Nicolas Vasilache 1a2474b786 [mlir][Linalg] Disable init_tensor elimination by default
init_tensor elimination is arguably a pre-optimization that should be separated from comprehensive bufferization.
In any case it is still experimental and easily results in wrong IR with violated SSA def-use orderings.
Isolate the optimization behind a flag, separate the test cases and add a test case that would results in wrong IR.

Differential Revision: https://reviews.llvm.org/D116936
2022-01-10 09:19:18 -05:00
Stephan Herhut 33cec20dbd [mlir][memref] Tighten verification of memref.reinterpret_cast
We allow the omission of a map in memref.reinterpret_cast under the assumption,
that the cast might cast to an identity layout. This change adds verification
that the static knowledge that is present in the reinterpret_cast supports
this assumption.

Differential Revision: https://reviews.llvm.org/D116601
2022-01-10 11:55:47 +01:00
Shraiysh Vaishay a8586b573e [mlir][OpenMP] Change the syntax of omp.atomic.read op
This patch changes the syntax of omp.atomic.read to take the address of
destination, instead of having the value in a result. This will allow
using omp.atomic.read operation within an omp.atomic.capture operation
thus making its implementation less complex.

Reviewed By: peixin

Differential Revision: https://reviews.llvm.org/D116396
2022-01-10 16:19:45 +05:30
Uday Bondhugula 9cf9ed94ed Multiple fixes to affine loop tiling return status and checks
Fix crash in the presence of yield values. Multiple fixes to affine loop
tiling pre-condition checks and return status. Do not signal pass
failure on a failure to tile since the IR is still valid. Detect index
set computation failure in checkIfHyperrectangular and return failure.
Replace assertions with proper status return. Move checks to an
appropriate place earlier in the utility before mutation happens.

Differential Revision: https://reviews.llvm.org/D116738
2022-01-08 16:50:44 +05:30
Matthias Springer 8e2b6aac32 [mlir][linalg][bufferize][NFC] Analyze OpOperands instead of OpResults
With this change, the analysis takes a look at OpOperands instead of OpResults. OpOperands can bufferize out-of-place (even if they have no aliasing OpResults). The analysis does no longer care about OpResults.

Previously, only OpResults could bufferize out-of-place, so OpOperands that have no aliasing OpResults were never copied by Comprehensive Bufferize. This does not fit wwell with the new CallOp bufferization that is introduced in a subsequent change. In essence, called FuncOps can then be treated as "black boxes" that may read/write to any bbArg, even if they do not return anything.

Differential Revision: https://reviews.llvm.org/D115706
2022-01-08 01:00:30 +09:00
Matthias Springer 547b9afc54 [mlir][linalg][bufferize][NFC] Add explicit inplaceable attrs to test cases
This is in preparation of fixing CallOp bufferization. Add explicit linalg.inplaceable attrs to all bbArgs, except for the ones where inplaceability should be decided by the analysis.

Differential Revision: https://reviews.llvm.org/D115840
2022-01-08 00:25:15 +09:00
Matthias Springer b8d0753694 [mlir][linalg][bufferize] Fix copy elision in `getResultBuffer`
A buffer copy may not be elided if the to-be-bufferized op is reading the data.

Differential Revision: https://reviews.llvm.org/D116454
2022-01-08 00:19:17 +09:00
Alex Zinenko f50cfc44d6 [mlir] Require struct indices in LLVM::GEPOp to be constant
Recent commits added a possibility for indices in LLVM dialect GEP operations
to be supplied directly as constant attributes to ensure they remain such until
translation to LLVM IR happens. Make this required for indexing into LLVM
struct types to match LLVM IR requirements, otherwise the translation would
assert on constructing such IR.

For better compatibility with MLIR-style operation construction interface,
allow GEP operations to be constructed programmatically using Values pointing
to known constant operations as struct indices.

Depends On D116758

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D116759
2022-01-07 09:56:05 +01:00
Alex Zinenko 43ff4a6d55 [mlir] Add ConstantLike trait to LLVM::ConstantOp
This make LLVM dialect constants to work with `m_constant` matches. Implement
the folding hook for this operation as required by the trait. This in turn
allows LLVM::ConstantOp to properly participate in constant-folding.

Depends On D116757

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D116758
2022-01-07 09:56:03 +01:00
Alex Zinenko cafaa35036 [mlir] Make it possible to directly supply constant values to LLVM GEPOp
In LLVM IR, the GEP indices that correspond to structures are required to be
i32 constants. MLIR models constants as just values defined by special
operations, and there is no verification that it is the case for structure
indices in GEP. Furthermore, some common transformations such as control flow
simplification may lead to the operands becoming non-constant. Make it possible
to directly supply constant values to LLVM GEPOp to guarantee they remain
constant until the translation to LLVM IR. This is not yet a requirement and
the verifier is not modified, this will be introduced separately.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D116757
2022-01-07 09:56:01 +01:00
William S. Moses 34646a2f7e [MLIR][Arith] Fold repeated xor and trunc
This patch adds two folds. One for a repeated xor (e.g. xor(xor(x, a), a)) and one for a repeated trunc (e.g. trunc(trunc(x))).

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D116383
2022-01-07 03:36:10 -05:00
Shraiysh Vaishay 6bcb4c44de [mlir][OpenMP] Added omp.atomic.write lowering to LLVM IR
This patch adds omp.atomic.write lowering to LLVM IR.
Also, changed the syntax to have equal symbol instead of the comma to
make it more intuitive.

Reviewed By: kiranchandramohan, peixin

Differential Revision: https://reviews.llvm.org/D116416
2022-01-07 10:01:57 +05:30
MaheshRavishankar 4317a3dfad [mlir][Linalg] Disable fusion of reshape with elementwise ops for purely dynamic cases.
`tensor.collapse_shape` op when fused with a consumer elementwise
`linalg.generic` operation results in creation of tensor.expand_shape
ops. In purely dynamic cases this can end up with a dynamic dimensions
being expanded to more than one dynamic dimension. This is disallowed
by the semantics of `tensor.expand_shape` operation. (While the
transformation is itself correct, its a gap in the specification of
`tensor.expand_shape` that is the issue). So disallow fusions which
result in such a pattern.

Differential Revision: https://reviews.llvm.org/D116703
2022-01-06 10:32:24 -08:00
Matthias Springer 15c7e3ee15 [mlir][linalg][bufferize][NFC] Use RewritePatterns instead of custom traversal
This change simplifies BufferizableOpInterface and other functions. Overall, the API will get smaller: Functions related to custom IR traversal are deleted entirely. This will makes it easier to write BufferizableOpInterface implementations.

This is also in preparation of unifying Comprehensive Bufferize and core bufferization. While Comprehensive Bufferize could theoretically maintain its own IR traversal, there is no reason to do so, because all bufferize implementations in BufferizableOpInterface have to support partial bufferization anyway. And we can share a larger part of the code base between the two bufferizations.

Differential Revision: https://reviews.llvm.org/D116448
2022-01-07 00:56:54 +09:00
Matthias Springer 18e08fbd01 [mlir][linalg][bufferize] Fix tiled_loop bufferization
Until now, bufferization assumed that the yieleded tensor of a linalg.tiled_loop is an output tensor. This is not necessarily the case.

Differential Revision: https://reviews.llvm.org/D116685
2022-01-06 17:51:33 +09:00
William S. Moses 358d020017 [MLIR][LLVM] Add simple folders for bitcast/addrspacecast/gep
Add 5 simple folders
* bitcast(x : T0, T0) -> x
* addrcast(x : T0, T0) -> x
* bitcast(bitcast(x : T0, T1), T0) -> x
* addrcast(addrcast(x : T0, T1), T0) -> x
* gep %x:T, 0 -> %x:T

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D116715
2022-01-05 21:17:32 -05:00
Alex Zinenko 06cc2f2f12 [mlir] Align LLVM_Type ODS constraint on type verifiers
Verify only the outer type being LLVM-compatible, the elemental types if
present are already checked by the type verifiers. This makes some LLVM dialect
operations compatible with mixed-dialect types that appear during progressive
lowering.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D116671
2022-01-05 19:00:56 +01:00
Matthias Springer b15b0156ca [mlir][linalg][bufferize][NFC] Simplify bufferization of CallOps
There is no need to inspect the ReturnOp of the called function.

This change also refactors the bufferization of CallOps in such a way that `lookupBuffer` is called only a single time. This is important for a later change that fixes CallOp bufferization. (There is currently a TODO among the test cases.)

Note: This change modifies a test case but is marked as NFC. There is no change of functionality, but FuncOps with empty bodies are now reported with a different error message.

Differential Revision: https://reviews.llvm.org/D116446
2022-01-06 00:28:47 +09:00
Matthias Springer a98c5a08b1 [mlir][linalg][bufferize] Fix CallOps with non-tensor operands
Such CallOps were not handled properly. When computing the new result types (and replacement values) of a CallOp, non-tensor return values were not accounted for.

Differential Revision: https://reviews.llvm.org/D116445
2022-01-06 00:19:23 +09:00
wren romano bc04a47038 [mlir][sparse] adding OverheadType::kIndex
Depends On D115008

This change opens the way for D115012, and removes some corner cases in `CodegenUtils.cpp`. The `SparseTensorAttrDefs.td` already specifies that we allow `0` bitwidth for the two overhead types and that it is interpreted to mean the architecture's native width.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D115010
2022-01-04 16:15:54 -08:00
Stanislav Funiak de6c82d6fd [MLIR][PDL] Generalize result type verification
Presently the result type verification checks if the type is used by a `pdl::OperationOp` inside the matcher. This is unnecessarily restrictive; the type could come from a `pdl::OperandOp or `pdl::OperandsOp` and still be inferrable.

Reviewed By: rriddle, Mogball

Differential Revision: https://reviews.llvm.org/D116083
2022-01-04 08:11:46 +05:30
Alexander Belyaev 550ea385ab [mlir] Remove unnecessary canonicalization from Linalg Detensorize.cpp
After https://reviews.llvm.org/D115821 it became possible to create
`tensor<elem_type>` with a single `tensor.from_elements` operation without
collapsing tensor shape from `tensor<1xelem_type>` to `tensor<elem_type>`

Differential Revision: https://reviews.llvm.org/D115891
2022-01-03 16:33:45 +01:00
William S. Moses 21aa2a1b09 [MLIR] Create add of sub folder
Create folders for add(sub(a, b), b) -> a and add(b, sub(a, b)) -> a

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D116471
2022-01-03 09:28:20 -05:00
William S. Moses 93c791839a [MLIR] Canonicalize/fold select %x, 1, 0 to extui
Two canonicalizations for select %x, 1, 0
  If the return type is i1, return simply the condition %x, otherwise extui %x to the return type.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D116517
2022-01-03 01:26:28 -05:00
William S. Moses 834cf3be22 [MLIR][Arith] Canonicalize and/or with ext
Replace and(ext(a),ext(b)) with ext(and(a,b)). This both reduces one instruction, and results in the computation (and/or) being done on a smaller type.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D116519
2022-01-03 01:25:30 -05:00
William S. Moses 1bb9f4e482 [MLIR] Create folders for extsi/extui
Create folders/canonicalizers for extsi/extui. Specifically,

extui(extui(x)) -> extui(x)
extsi(extsi(x)) -> extsi(x)
extsi(extui(x)) -> extui(x)

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D116515
2022-01-03 00:11:23 -05:00
William S. Moses 1a0a177965 [MLIR] Create fold for cmp of ext
This patch creates folds for cmpi( ext(%x : i1, iN) != 0) -> %x

In essence this matches patterns matching an extension of a boolean, that != 0, which is equivalent to the original condition.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D116504
2022-01-02 19:48:52 -05:00
Markus Böck 3536d24a1a [mlir][LLVMIR] Add `llvm.eh.typeid.for` intrinsic
MLIR already exposes landingpads, the invokeop and the personality function on LLVM functions. With this intrinsic it should be possible to implement exception handling via the exception handling mechanisms provided by the Itanium ABI.

Differential Revision: https://reviews.llvm.org/D116436
2022-01-01 02:03:00 +01:00
William S. Moses a6a583dae4 [MLIR] Move AtomicRMW into MemRef dialect and enum into Arith
Per the discussion in https://reviews.llvm.org/D116345 it makes sense
to move AtomicRMWOp out of the standard dialect. This was accentuated by the
need to add a fold op with a memref::cast. The only dialect
that would permit this is the memref dialect (keeping it in the standard dialect
or moving it to the arithmetic dialect would require those dialects to have a
dependency on the memref dialect, which breaks linking).

As the AtomicRMWKind enum is used throughout, this has been moved to Arith.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D116392
2021-12-30 14:31:33 -05:00
Nicolas Vasilache 2e69f4f012 [mlir][vector] Fix illegal vector.transfer + tensor.insert/extract_slice folding
vector.transfer operations do not have rank-reducing semantics.

Bail on illegal rank-reduction: we need to check that the rank-reduced
dims are exactly the leading dims. I.e. the following is illegal:
```
   %0 = vector.transfer_write %v, %t[0,0], %cst :
     vector<2x4xf32>, tensor<2x4xf32>
   %1 = tensor.insert_slice %0 into %tt[0,0,0][2,1,4][1,1,1] :
     tensor<2x4xf32> into tensor<2x1x4xf32>
```

Cannot fold into:
```
   %0 = vector.transfer_write %v, %t[0,0,0], %cst :
     vector<2x4xf32>, tensor<2x1x4xf32>
```
For this, check the trailing `vectorRank` dims of the insert_slice result
tensor match the trailing dims of the inferred result tensor.

Differential Revision: https://reviews.llvm.org/D116409
2021-12-30 14:55:16 +00:00
MaheshRavishankar 7df7586a0b [mlir][MemRef] Deprecate unspecified trailing offset, size, and strides semantics of `OffsetSizeAndStrideOpInterface`.
The semantics of the ops that implement the
`OffsetSizeAndStrideOpInterface` is that if the number of offsets,
sizes or strides are less than the rank of the source, then some
default values are filled along the trailing dimensions (0 for offset,
source dimension of sizes, and 1 for strides). This is confusing,
especially with rank-reducing semantics. Immediate issue here is that
the methods of `OffsetSizeAndStridesOpInterface` assumes that the
number of values is same as the source rank. This cause out-of-bounds
errors.

So simplifying the specification of `OffsetSizeAndStridesOpInterface`
to make it invalid to specify number of offsets/sizes/strides not
equal to the source rank.

Differential Revision: https://reviews.llvm.org/D115677
2021-12-29 11:18:29 -08:00
William S. Moses 180455ae5e [MLIR][LLVM] Expose powi intrinsic to MLIR
Expose the powi intrinsic to the LLVM dialect within MLIR

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D116364
2021-12-29 13:09:35 -05:00
William S. Moses ca8997eb7f [MLIR] Add constant folder for fptosi and friends
This patch adds constant folds for FPToSI/FPToUI/SIToFP/UIToFP

Reviewed By: mehdi_amini, bondhugula

Differential Revision: https://reviews.llvm.org/D116321
2021-12-28 23:50:01 -05:00
Adrian Kuegel 4a10457d33 [mlir][arith] Fix CmpIOP folding for vector types.
Previously, the folding assumed that it always operates on scalar types.

Differential Revision: https://reviews.llvm.org/D116151
2021-12-22 18:12:24 +01:00
Butygin 28ab10f404 [mlir][memref] ReinterpretCast: allow static sizes/strides/offset where affine map expects dynamic
* There is no reason to forbid that case
* Also, user will get very unfriendly error like `expected result type with offset = -9223372036854775808 instead of 1`

Differential Revision: https://reviews.llvm.org/D114678
2021-12-21 16:20:01 +03:00
Butygin c7f96d5ab1 [mlir][scf] Canonicalize nested scf.if's to scf.if + arith.and
Differential Revision: https://reviews.llvm.org/D115930
2021-12-20 21:53:03 +03:00
MaheshRavishankar 4142932a83 [mlir][Linalg] Move named op conversions out of canonicalizations.
These conversions are better suited to be applied at whole tensor
level. Applying these as canonicalizations end up triggering such
canonicalizations at all levels of the stack which might be
undesirable. For example some of the resulting code patterns wont
bufferize in-place and need additional stack buffers. Best is to be
more deliberate in when these canonicalizations apply.

Differential Revision: https://reviews.llvm.org/D115912
2021-12-20 10:19:05 -08:00
bakhtiyar ec0e4545ca Make AsyncParallelForRewrite parameterizable with a cost model which drives deciding the parallelization granularity.
Reviewed By: ezhulenev, mehdi_amini

Differential Revision: https://reviews.llvm.org/D115423
2021-12-19 08:41:01 -08:00
Aaron DeBattista 64f694acaf [mlir][tosa] Move tosa canonicalizers to optional optimization pass
TOSA's canonicalizers that change dense operations should be moved to a
seperate optimization pass to avoid canonicalizing to operations not supported
for relevant backends.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D115890
2021-12-16 23:33:54 -08:00
not-jenni f9cefc7b90 [mlir][tosa] Add tosa.max_pool2d as no-op canonicalization
When the input and output of a pool2d op are both 1x1, it can be canonicalized to a no-op

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D115908
2021-12-16 15:27:26 -08:00
Rob Suderman 9a2308e170 [mlir][tosa] Minor cleanup of tosa.conv2d canonicalizer
Slight rename and better variable type usage in tosa.conv2d to
tosa.fully_connected lowering. Included disabling pass for padded
convolutions.

Reviewed By: not-jenni

Differential Revision: https://reviews.llvm.org/D115776
2021-12-16 15:13:01 -08:00
Alexander Belyaev 88df30c8d8 [mlir] Add canonicalization for extract(tensor.from_elements) in 0d case.
Differential Revision: https://reviews.llvm.org/D115875
2021-12-16 15:46:57 +01:00
Alexander Belyaev f77e9f8768 [mlir] Extend `tensor.from_elements` to support N-D case.
RFC: https://llvm.discourse.group/t/rfc-extend-tensor-fromelementsop-to-n-d/4715

Differential Revision: https://reviews.llvm.org/D115821
2021-12-16 14:52:41 +01:00
Diego Caballero 32fe1a8a25 [mlir][GPU] Extend GPU kernel outlining to generate DL specification
This patch extends the GPU kernel outlining pass so that it can take in
an optional data layout specification that will be attached to the GPU
module operation generated. If the data layout specification is not provided
the default data layout is used instead.

Reviewed By: herhut, mehdi_amini

Differential Revision: https://reviews.llvm.org/D115722
2021-12-16 11:35:53 +00:00
Hanhan Wang 501674dc3b [mlir][Vector] Further fix to avoid infinite loop in InnerOuterDimReductionConversion
If all the dims are reduction dims, it is already in inner-most/outer-most
reduction form.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D115820
2021-12-15 13:54:15 -08:00
Matthias Springer 417014170b [mlir][linalg][bufferize] Replace remaining bvm usage with new API
* Call `replaceOp` instead of `mapBuffer`.
* Remove bvm and all helper functions around bvm.
* Simplify FuncOp bufferization and rely on existing functionality to generate ToMemrefOps for function BlockArguments.

Differential Revision: https://reviews.llvm.org/D115515
2021-12-15 23:21:39 +09:00
Matthias Springer a5927737da [mlir][linalg][bufferize] Reimplementation of scf.if bufferization
Instead of modifying the existing scf.if op, create a new op with memref OpOperands/OpResults and delete the old op.

New allocations / other memrefs can now be yielded from the op. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.

Differential Revision: https://reviews.llvm.org/D115491
2021-12-15 18:40:54 +09:00
Javier Setoain a4830d14ed [mlir][RFC] Add scalable dimensions to VectorType
With VectorType supporting scalable dimensions, we don't need many of
the operations currently present in ArmSVE, like mask generation and
basic arithmetic instructions. Therefore, this patch also gets
rid of those.

Having built-in scalable vector support also simplifies the lowering of
scalable vector dialects down to LLVMIR.

Scalable dimensions are indicated with the scalable dimensions
between square brackets:

        vector<[4]xf32>

Is a scalable vector of 4 single precission floating point elements.

More generally, a VectorType can have a set of fixed-length dimensions
followed by a set of scalable dimensions:

        vector<2x[4x4]xf32>

Is a vector with 2 scalable 4x4 vectors of single precission floating
point elements.

The scale of the scalable dimensions can be obtained with the Vector
operation:

        %vs = vector.vscale

This change is being discussed in the discourse RFC:

https://llvm.discourse.group/t/rfc-add-built-in-support-for-scalable-vector-types/4484

Differential Revision: https://reviews.llvm.org/D111819
2021-12-15 09:31:37 +00:00
Matthias Springer 7161aa06ef [mlir][linalg][bufferize] Reimplementation of scf.for bufferization
Instead of modifying the existing scf.for op, create a new op with memref OpOperands/OpResults and delete the old op.

New allocations / other memrefs can now be yielded from the loop. This functionality is deactivated by default and guarded against by AssertDestinationPassingStyle.

This change also introduces `replaceOp`, which will be utilized by all other `bufferize` implementations in future commits. Bufferization will then no longer rely on old (pre-bufferize) ops to DCE away. Instead old ops are deleted on the spot. This improves debuggability because there won't be any duplicate ops anymore (bufferized + not-yet-bufferized) when dumping IR during bufferization. It is also less fragile because unbufferized IR can no longer silently "hang around" due to an implementation bug.

Differential Revision: https://reviews.llvm.org/D114926
2021-12-15 18:29:22 +09:00
gysit 9912bed730 [mlir][linalg] Remove RangeOp and RangeType.
Remove the RangeOp and the RangeType that are not actively used anymore. After removing RangeType, the LinalgTypes header only includes the generated dialect header.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115727
2021-12-15 07:19:10 +00:00
Alexander Belyaev a82a19c137 [mlir] Add a missing pattern to bufferize tensor.rank.
Differential Revision: https://reviews.llvm.org/D115745
2021-12-14 20:04:57 +01:00
Alexander Belyaev 15f8f3e20a [mlir] Split std.rank into tensor.rank and memref.rank.
Move `std.rank` similarly to how `std.dim` was moved to TensorOps and MemRefOps.

Differential Revision: https://reviews.llvm.org/D115665
2021-12-14 10:15:55 +01:00
Markus Böck ef5be2bb16 [mlir] Implement `DataLayoutTypeInterface` for `LLVMArrayType`
Implementation of the interface allows querying the size and alignments of an LLVMArrayType as well as query the size and alignment of a struct containing an LLVMArrayType.
The implementation should yield the same results as llvm::DataLayout, including support for over aligned element types.
There is no customization point for adjusting an arrays alignment; it is simply taken from the element type.

Differential Revision: https://reviews.llvm.org/D115704
2021-12-14 09:35:45 +01:00
Benoit Jacob aba437ceb2 [mlir][Vector] Patterns flattening vector transfers to 1D
This is the second part of https://reviews.llvm.org/D114993 after slicing
into 2 independent commits.

This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.

For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.

The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.

The new patterns here are only enabled for now by
 -test-vector-transfer-flatten-patterns.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114993
2021-12-13 22:39:41 +00:00
Benoit Jacob 0aea49a730 [mlir][Vector] Patterns flattening vector transfers to 1D
This is the first part of https://reviews.llvm.org/D114993 which has been
split into small independent commits.

This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.

For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.

The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.

The new patterns here are only enabled for now by
 -test-vector-transfer-flatten-patterns.

Reviewed By: nicolasvasilache
2021-12-13 21:49:04 +00:00
Stella Laurenzo c10995a8ad Re-apply [NFC] Generalize a couple of passes so they can operate on any FunctionLike op.
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
  * Since these passes are now generic, they do not automatically nest with pass managers set up for implicit nesting.
  * The Detensorize pass must run on a FunctionLike, and this requires explicit nesting.
* Addressed missed comments from the original and per-suggestion removed the assert on FunctionLike in ElementwiseToLinalg and DropUnitDims.cpp, which also is what was causing the integration test to fail.

This reverts commit aa8815e42e.

Differential Revision: https://reviews.llvm.org/D115671
2021-12-13 13:33:00 -08:00
Mehdi Amini aa8815e42e Revert "[NFC] Generalize a couple of passes so they can operate on any FunctionLike op."
This reverts commit 34696e6542.

A test is crashing on the mlir-nvidia bot.
2021-12-13 20:41:25 +00:00
Stella Laurenzo 34696e6542 [NFC] Generalize a couple of passes so they can operate on any FunctionLike op.
* Generalizes passes linalg-detensorize, linalg-fold-unit-extent-dims, convert-elementwise-to-linalg.
* I feel that more work could be done in the future (i.e. make FunctionLike into a proper OpInterface and extend actions in dialect conversion to be trait based), and this patch would be a good record of why that is useful.
* Note for downstreams:
  * Since these passes are now generic, they do not automatically nest with pass managers set up for that.
  * If running them over nested functions, you must nest explicitly. Upstream has adopted this style but *-opt still has some uses of implicit pipelines via args. See tests for argument changes needed.

Differential Revision: https://reviews.llvm.org/D115645
2021-12-13 12:01:53 -08:00
gysit 8fc0525a15 [mlir][linalg] Stage application of pad tensor op vectoriztaion.
Adapt the LinalgStrategyVectorizationPattern pass to apply the vectorization patterns in two stages. The change ensures the generic pad tensor op vectorization pattern does not run too early. Additionally, the revision adds the transfer op canonicalization patterns to the set of applied patterns, since they are needed to enable efficient vectorization for rank-reduced convolutions.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115627
2021-12-13 19:49:35 +00:00
gysit 6c85a49e22 [mlir][memref] Use current source type in getCanonicalSubViewResultType.
Use the current instead of the new source type to compute the rank-reduction map in getCanonicalSubViewResultType. Otherwise, the computation of the rank-reduction map fails when folding a cast into a subview since the strides of the new source type cannot be related to the strides of the current result type.

Depends On D115428

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115446
2021-12-13 14:50:41 +00:00
Markus Böck 664cc9312c [mlir] Implement `DataLayoutTypeInterface` for `LLVMStructType`
Using this implementation of the interface it is possible to query the size, ABI alignment as well as the preferred alignment of a struct. It should yield the same results as LLVMs `llvm::DataLayout` on an equivalent `llvm::StructType`, including for packed structs.

Additionally it is also possible to increase the ABI and preferred alignment using a data layout entry with the type `llvm.struct<()>, which serves the same functionality as the `a:` component in LLVMs data layout string.

Differential Revision: https://reviews.llvm.org/D115600
2021-12-13 15:09:16 +01:00
gysit db7a2e9176 [mlir][linalg] Only compose PadTensorOps if no ExtractSliceOp is rank-reducing.
Do not compose pad tensor operations if the extract slice of the outer pad tensor operation is rank reducing. The inner extract slice op cannot be rank-reducing since it source type must match the desired type of the padding.

Depends On D115359

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115428
2021-12-13 13:01:30 +00:00
gysit 6859f8ed1e [mlir][linalg] Adapt the PadTensorOpVectorizationWithInsertSlicePattern matching.
Tighten the matcher of the PadTensorOpVectorizationWithInsertSlicePattern pattern. Only match if the PadOp result is used by the InsertSliceOp source. Fail if the result is used by the InsertSliceOp dest.

Depends On D115336

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115359
2021-12-13 12:55:07 +00:00
gysit f895e95138 [mlir][linalg] Make padding work for rank-reducing slice ops.
Adapt the computation of a static bounding box to take rank-reducing slice operations into account by filtering out reduced size one dimensions. The revision is needed to make padding work for decomposed convolution operations. The decomposition introduces rank reducing extract slice operations that previously let padding fail.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115336
2021-12-13 12:34:20 +00:00
Jacques Pienaar efb7727a96 [mlir] Flag near misses in file splitting
Flags some potential cases where splitting isn't happening and so could result
in confusing results. Also update some test files where there were near misses
in splitting that seemed unintentional.

Differential Revision: https://reviews.llvm.org/D109636
2021-12-12 08:03:30 -08:00
Nicolas Vasilache 408553dd96 [mlir][Vector] Support 0-D vectors in `CreateMaskOp`
The 0-D case gets lowered in almost the same way that the 1-D case does
in VectorCreateMaskOpConversion. I also had to slightly update the
verifier for the op to always require exactly 1 operand in the 0-D case.

Depends On D115220

Reviewed by: ftynse

Differential revision: https://reviews.llvm.org/D115221
2021-12-12 13:32:29 +00:00
Michal Terepeta a0c930d312 [mlir][Vector] Support 0-D vectors in `CmpIOp`
Following the example of `VectorOfAnyRankOf`, I've done a few changes in the
`.td` files to help with adding the support for the 0-D case gradually.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D115220
2021-12-12 13:28:26 +00:00
Nicolas Vasilache f2e945a393 Revert "[mlir][tensor] Fix insert_slice + tensor cast overflow"
This reverts commit 5601821dae.

The prefix + canonical complete behavior is actually obsolete and should not be reintroduced.
Reverting.
2021-12-10 22:53:52 +00:00
Nicolas Vasilache 5601821dae [mlir][tensor] Fix insert_slice + tensor cast overflow
InsertSliceOp may have subprefix semantics where missing trailing dimensions
are automatically inferred directly from the operand shape.
This revision fixes an overflow that occurs in such cases when the impl is based on the op rank.

Differential Revision: https://reviews.llvm.org/D115549
2021-12-10 21:41:26 +00:00
River Riddle 98f5bd3489 [mlir:PDL] Adjust the assembly format for AttributeOp to avoid conflicts with DictionaryAttr
Switch the attribute creation operations to use attr-dict-with-
keyword to avoid conflicts (in the case of pdl.attribute) and
confusion(in the case of pdl_interp.create_attribute) with
having a DictionaryAttr as a value and specifying the
attributes of the operation itself (as a dictionary).

Differential Revision: https://reviews.llvm.org/D114815
2021-12-10 19:38:42 +00:00
River Riddle 9debc35f02 [mlir:PDL] Fix assembly format for pdl.apply_native_rewrite
The results of a rewrite are optional, but we currently require
them to be present in the assembly format. This commit
makes the results component in the format optional.

Differential Revision: https://reviews.llvm.org/D114814
2021-12-10 19:38:42 +00:00
Alexander Belyaev b618880e7b [mlir] Move `linalg.tensor_expand/collapse_shape` to TensorDialect.
RFC: https://llvm.discourse.group/t/rfc-reshape-ops-restructuring/3310

linalg.fill gets a canonicalizer, because `FoldFillWithTensorReshape` cannot be moved to tensorops (it uses linalg::FillOp inside). Before it was listed as a canonicalization pattern for the reshape operations, now it became a canonicalization for FillOp.

Differential Revision: https://reviews.llvm.org/D115502
2021-12-10 12:11:48 +01:00
Rob Suderman 46c96fca0e [mlir][tosa] Fix quantized type for tosa.conv2d canonicalization
Wrong type was used for the result type in the tosa.conv_2d canonicalization.
The type should match the result element type should match the result type
not the input element type.

Differential Revision: https://reviews.llvm.org/D115463
2021-12-09 12:39:23 -08:00
MaheshRavishankar 9cfd8d7c6c [mlir][Vector] Avoid infinite loop in InnerOuterDimReductionConversion.
This patterns tries to convert an inner (outer) dim reduction to an
outer (inner) dim reduction. Doing this on a 1D or 0D vector results
in an infinite loop since the converted op is same as the original
operation. Just returning failure when source rank <= 1 fixes the
issue.

Differential Revision: https://reviews.llvm.org/D115426
2021-12-09 09:30:05 -08:00
Krzysztof Drewniak e1da62910e [MLIR][GPU] Define gpu.printf op and its lowerings
- Define a gpu.printf op, which can be lowered to any GPU printf() support (which is present in CUDA, HIP, and OpenCL). This op only supports constant format strings and scalar arguments
- Define the lowering of gpu.pirntf to a call to printf() (which is what is required for AMD GPUs when using OpenCL) as well as to the hostcall interface present in the AMD Open Compute device library, which is the interface present when kernels are running under HIP.
- Add a "runtime" enum that allows specifying which of the possible runtimes a ROCDL kernel will be executed under or that the runtime is unknown. This enum controls how gpu.printf is lowered

This change does not enable lowering for Nvidia GPUs, but such a lowering should be possible in principle.

And:
[MLIR][AMDGPU] Always set amdgpu-implicitarg-num-bytes=56 on kernels

This is something that Clang always sets on both OpenCL and HIP kernels, and failing to include it causes mysterious crashes with printf() support.

In addition, revert the max-flat-work-group-size to (1, 256) to avoid triggering bugs in the AMDGPU backend.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110448
2021-12-09 15:54:31 +00:00
Eugene Zhulenev 49ce40e9ab [mlir] AsyncParallelFor: align block size to be a multiple of inner loops iterations
Depends On D115263

By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D115436
2021-12-09 06:50:50 -08:00
Eugene Zhulenev 9f151b784b [mlir] AsyncParallelFor: sink constants into the parallel compute function
With complex recursive structure of async dispatch function LLVM can't always propagate constants to the parallel_compute_fn and it often prevents optimizations like loop unrolling and vectorization. We help LLVM by pushing known constants into the parallel_compute_fn explicitly.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D115263
2021-12-09 06:48:23 -08:00
Matthias Springer cc45a13422 [mlir][linalg][bufferize] LinalgOps can bufferize inplace with input args
LinalgOp results usually bufferize inplace with output args. With this change, they may buffer inplace with input args if the value of the output arg is not used in the computation.

Differential Revision: https://reviews.llvm.org/D115022
2021-12-09 21:54:54 +09:00
Shraiysh Vaishay d82c1f4e4b [MLIR][OpenMP] Added omp.atomic.update
This patch supports the atomic construct (update) following section 2.17.7 of OpenMP 5.0 standard. Also added tests and verifier for the same.

Reviewed By: kiranchandramohan, peixin

Differential Revision: https://reviews.llvm.org/D112982
2021-12-09 15:21:24 +05:30
Nicolas Vasilache d69f5e197c [mlir][memref] Fix subview offset verification.
Offset-specific verification seems to have been lost in one of the recent refactorings.
Also add proper tests that would have caught this omission.

This addresses the immediate issues discussed in:
https://llvm.discourse.group/t/memref-subview-affine-map-and-symbols/4851

Differential Revision: https://reviews.llvm.org/D115427
2021-12-09 07:44:51 +00:00
MaheshRavishankar 6d7c9c3d0e [mlir][Linalg] Bufferize the region of LinalgOps as well.
The region of `linalg.generic` might contain `tensor` operations. For
example, current lowering of `gather` uses a `tensor.extract` in the
body of the `LinalgOp`. Bufferize the ops within a `LinalgOp` region
as well to catch such cases.

Differential Revision: https://reviews.llvm.org/D115322
2021-12-08 22:36:01 -08:00
Thomas Raoux 579c1ff67d [mlir][nvvm] Add async copy ops to nvvm dialect
Differential Revision: https://reviews.llvm.org/D115314
2021-12-08 09:42:20 -08:00
Matthias Springer 847710f7b7 [mlir][linalg][bufferize] Add dialect filter to BufferizationOptions
This adds a new option `dialectFilter` to BufferizationOptions. Only ops from dialects that are allow-listed in the filter are bufferized. Other ops are left unbufferized. Note: This option requires `allowUnknownOps = true`.

To make use of `dialectFilter`, BufferizationOptions or BufferizationState must be passed to various helper functions.

The purpose of this change is to provide a better infrastructure for partial bufferization, which will be fully activated in a subsequent change.

Differential Revision: https://reviews.llvm.org/D114691
2021-12-08 23:51:18 +09:00
Mehdi Amini ee0908703d Change the printing/parsing behavior for Attributes used in declarative assembly format
The new form of printing attribute in the declarative assembly is eliding the `#dialect.mnemonic` prefix to only keep the `<....>` part.

Differential Revision: https://reviews.llvm.org/D113873
2021-12-08 02:02:37 +00:00
Aart Bik e1b9d80532 [mlir][sparse] add a few more sparse output tests (for generated IR)
also fixes two typos in IR doc

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D115288
2021-12-07 15:31:29 -08:00
Aart Bik 4f2ec7f983 [mlir][sparse] finalize sparse output in the presence of reductions
This revision implements sparse outputs (from scratch) in all cases where
the loops can be reordered with all but one parallel loops outer. If the
inner parallel loop appears inside one or more reductions loops, then an
access pattern expansion is required (aka. workspaces in TACO speak).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D115091
2021-12-07 10:54:29 -08:00
Rob Suderman e9fae0f19e [mlir][tosa] Disable tosa.depthwise_conv2d canonicalizer for quantized case
Quantized case needs to include zero-point corrections before the tosa.mul.
Disabled for the quantized use-case.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D115264
2021-12-07 10:16:12 -08:00
Matthias Springer 8a232632c5 [mlir][linalg][bufferize] Add FuncOp bufferization pass
This passes bufferizes FuncOp bodies, but not FuncOp boundaries.

Differential Revision: https://reviews.llvm.org/D114671
2021-12-07 21:44:26 +09:00
not-jenni 5911a29aa9 [mlir][tosa] Add tosa.depthwise_conv2d as tosa.mul canonicalization
For a 1x1 weight and stride of 1, the input/weight can be reshaped and
multiplied elementwise then reshaped back

Reviewed By: rsuderman, KoolJBlack

Differential Revision: https://reviews.llvm.org/D115207
2021-12-06 17:28:52 -08:00
Rob Suderman 05e33d846f [mlir][tosa] Resubmit add tosa.conv2d as tosa.fully_connected canonicalization
Fixed the tosa.conv2d to tosa.fully_connected canonicalization for incorrect
output channels. Included uptes to tests to include checks for the result
shapes during canonicalization.

This allows conv2d to transform to the simpler fully_connected operation.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D115170
2021-12-06 15:33:07 -08:00
Matthias Springer e9fb4dc9e9 [mlir][linalg][bufferize] Remove buffer equivalence from bufferize
Remove all function calls related to buffer equivalence from bufferize implementations.

Add a new PostAnalysisStep for scf.for that ensures that yielded values are equivalent to the corresponding BBArgs. (This was previously checked in `bufferize`.) This will be relaxed in a subsequent commit.

Note: This commit changes two test cases. These were broken by design
and should not have passed. With the new scf.for PostAnalysisStep, this
bug was fixed.

Differential Revision: https://reviews.llvm.org/D114927
2021-12-06 17:48:31 +09:00
Matthias Springer cb4d0bf997 [mlir][linalg][bufferize][NFC] Collect equivalent FuncOp BBArgs in PostAnalysisStep
Collect equivalent BBArgs right after the equivalence analysis of the FuncOp and before bufferizing. This is in preparation of decoupling bufferization from aliasInfo.

Also gather equivalence info for CallOps, which was missing in the
previous commit.

Differential Revision: https://reviews.llvm.org/D114847
2021-12-06 17:31:39 +09:00
Michal Terepeta caf89c0db6 [mlir][Vector] Support 0-D vectors in `ConstantMaskOp`
To support creating both a mask with just a single `true` and `false` values,
I had to relax the restriction in the verifier that the rank is always equal to
the length of the attribute array, in other words, we now allow:

- `vector.constant_mask [0] : vector<i1>` which gets lowered to
  `arith.constant dense<false> : vector<i1>`
- `vector.constant_mask [1] : vector<i1>` which gets lowered to
  `arith.constant dense<true> : vector<i1>`

(the attribute list for the 0-D case must be a singleton containing
either `0` or `1`)

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115023
2021-12-06 08:03:04 +00:00
Mehdi Amini afb0582325 Fix TOSA verifier to emit verbose errors
Also as a test for invalid ops which was missing.
2021-12-05 19:16:54 +00:00
Butygin 91072b74f8 [mlir] Add InlinerInterface to bufferization dialect
Differential Revision: https://reviews.llvm.org/D115080
2021-12-04 23:45:56 +03:00
Uday Bondhugula 2108ed0671 [MLIR] Fix affine.for unroll for multi-result upper bound maps
Fix affine.for unroll for multi-result upper bound maps: these can't be
unrolled/unroll-and-jammed in cases where the trip count isn't known to
be a multiple of the unroll factor.

Fix and clean up repeated/unnecessary checks/comments at helper callees.

Also, fix clang-tidy variable naming warnings and redundant includes.

Differential Revision: https://reviews.llvm.org/D114662
2021-12-04 07:20:26 +05:30
natashaknk e2d8b60742 Revert "[mlir][tosa] Add tosa.conv2d as fully_connected canonicalization"
This reverts commit 13bdb7ab4a. The commit introduced/uncovered an unintended bug in models containing Conv2D.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D115079
2021-12-03 14:35:48 -08:00
Matthias Springer ad1ba42f68 [mlir][linalg][bufferize] Allow unbufferizable ops in input
Allow ops that are not bufferizable in the input IR. (Deactivated by default.)

bufferization::ToMemrefOp and bufferization::ToTensorOp are generated at the bufferization boundaries.

Differential Revision: https://reviews.llvm.org/D114669
2021-12-03 20:20:46 +09:00
Michal Terepeta 1423e8bf5d [mlir][Vector] Support 0-D vectors in `BitCastOp`
The implementation only allows to bit-cast between two 0-D vectors. We could
probably support casting from/to vectors like `vector<1xf32>`, but I wasn't
convinced that this would be important and it would require breaking the
invariant that `BitCastOp` works only on vectors with equal rank.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114854
2021-12-03 08:55:59 +00:00
Michal Terepeta 8e2b373396 [mlir][Vector] Add some missing tests for `broadcast` and `splat`
Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114853
2021-12-03 08:52:51 +00:00
Matthias Springer d30fcadf07 [mlir][linalg][bufferize] Op interface implementation for Bufferization dialect ops
This change provides `BufferizableOpInterface` implementations for ops from the Bufferization dialects. These ops are needed at the bufferization boundaries for partial bufferization.

Differential Revision: https://reviews.llvm.org/D114618
2021-12-03 16:25:44 +09:00
Matthias Springer 4479138de8 [mlir][linalg][bufferize] Bufferization of tensor.insert
This is a lightweight operation, useful for writing unit tests. It will be utilized for testing in subsequent commits.

Differential Revision: https://reviews.llvm.org/D114693
2021-12-02 11:58:01 +09:00
Nicolas Vasilache c537a94334 [mlir][Vector] Thread 0-d vectors through vector.transfer ops
This revision adds 0-d vector support to vector.transfer ops.
In the process, numerous cleanups are applied, in particular around normalizing
and reducing the number of builders.

Reviewed By: ThomasRaoux, springerm

Differential Revision: https://reviews.llvm.org/D114803
2021-12-01 16:49:43 +00:00
Matthias Springer 2fd0ea960c [mlir][linalg][bufferize] CallOps do not bufferize to memory writes
However, since CallOps have no aliasing OpResults, their OpOperands always bufferize out-of-place.

This change removes `bufferizesToMemoryWrite` from `CallOpInterface`. This method was called, but its return value did not matter.

Differential Revision: https://reviews.llvm.org/D114616
2021-12-01 18:47:28 +09:00
Thomas Raoux 69a8a7cf2d [mlir] Make sure linearizeCollapsedDims doesn't drop input map dims
The new affine map generated by linearizeCollapsedDims should not drop
dimensions. We need to make sure we create a map with at least as many
dimensions as the source map. This prevents
FoldProducerReshapeOpByLinearization from generating invalid IR.

This solves regression in IREE due to e4e4da86af

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D114838

This reverts commit 9a844c2a9b.
2021-11-30 22:51:56 -08:00
MaheshRavishankar 9a844c2a9b Revert "[mlir] Make sure linearizeCollapsedDims doesn't drop input map dims"
This reverts commit bc38673e4d.
2021-11-30 22:43:46 -08:00
MaheshRavishankar bc38673e4d [mlir] Make sure linearizeCollapsedDims doesn't drop input map dims
The new affine map generated by linearizeCollapsedDims should not drop
dimensions. We need to make sure we create a map with at least as many
dimensions as the source map. This prevents
FoldProducerReshapeOpByLinearization from generating invalid IR.

This solves regression in IREE due to e4e4da86af

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D114838
2021-11-30 22:37:53 -08:00
Aart Bik 0e85232fa3 [mlir][sparse] refine simply dynamic sparse tensor outputs
Proper test for sparse tensor outputs is a single condition throughout
the whole tensor index expression (not a general conjunction, since this
may include other conditions that cause cancellation).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D114810
2021-11-30 13:45:58 -08:00
Nicolas Vasilache a08b750ce9 [mlir][tensor] InsertSliceOp verification.
This revision reintroduces tensor.insert_slice verification which seems
to have vanished over time: a verifier was initially introduced in cf9503c1b7
but for some reason the invalid.mlir was not properly updated; as time passed the verifier was not called anymore and later the code was deleted.

As a consequence, a non-negligible portion of tests has run astray using invalid
tensor.insert_slice semantics and needed to be fixed.

Also, extract isRankReducedType from TensorOps for better reuse
Originally, this facility was used by both tensor and memref forms but
it got copied around as dialects were split.

Differential Revision: https://reviews.llvm.org/D114715
2021-11-30 20:37:06 +00:00
MaheshRavishankar 311dd55c9e [mlir][MemRef] Fix SubViewOp canonicalization when a subset of unit-dims are dropped.
The canonical type of the result of the `memref.subview` needs to make
sure that the previously dropped unit-dimensions are the ones dropped
for the canonicalized type as well. This means the generic
`inferRankReducedResultType` cannot be used. Instead the current
dropped dimensions need to be querried and the same need to be dropped.

Reviewed By: nicolasvasilache, ThomasRaoux

Differential Revision: https://reviews.llvm.org/D114751
2021-11-30 20:37:06 +00:00
not-jenni 13bdb7ab4a [mlir][tosa] Add tosa.conv2d as fully_connected canonicalization
For a 1x1 weight and stride of 1, the input/weight can be reshaped and passed into a fully connected op then reshaped back

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D114757
2021-11-30 12:01:14 -08:00
gysit c8f2139eb0 [mlir][linalg] Add decompose to CodegenStrategy.
Add the decompose patterns that lower higher dimensional convolutions to lower dimensional ones to CodegenStrategy and use CodegenStrategy to test the decompose patterns. Additionally, remove the assertion that checks the anchor op name is set in the CodegenStrategyTest pass. Removing the assertion allows us to simplify the pipelines used in the interchange and decompose tests.

Depends On D114797

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114798
2021-11-30 15:48:29 +00:00
gysit 316e627c2b [mlir][linalg] Support the empty anchor op string when padding.
Add support for an empty anchor op string in vectorization. An empty anchor op string is useful after fusion when there are multiple different operations to vectorize.

Depends On D114689

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114690
2021-11-30 15:32:13 +00:00
gysit 7f7103cd06 [mlir][linalg] Use top down traversal for padding.
Pad the operation using a top down traversal. The top down traversal unlocks folding opportunities and dim op canonicalizations due to the introduced extract slice operation after the padded operation.

Depends On D114585

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114689
2021-11-30 15:30:45 +00:00
gysit 914e72d400 [mlir][linalg] Run CSE after every CodegenStrategy transformation.
Add CSE after every transformation. Transformations such as tiling introduce redundant computation, for example, one AffineMinOp for every operand dimension pair. Follow up transformations such as Padding and Hoisting benefit from CSE since comparing slice sizes simplifies to comparing SSA values instead of analyzing affine expressions.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114585
2021-11-30 15:07:51 +00:00
Alexander Belyaev f89bb3c012 [mlir] Move bufferization-related passes to `bufferization` dialect.
[RFC](https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712)

Differential Revision: https://reviews.llvm.org/D114698
2021-11-30 09:58:47 +01:00
Aart Bik 7d4da4e1ab [mlir][sparse] generalize sparse tensor output implementation
Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D114399
2021-11-29 16:15:53 -08:00
Benjamin Kramer 8d474f1d15 [mlir] Handle an edge case when folding reshapes with multiple trailing 1 dimensions
We would exit early and miss this case.

Differential Revision: https://reviews.llvm.org/D114711
2021-11-29 18:31:43 +01:00
Stephan Herhut 95f34e318c [mlir][memref] Fix bug in verification of memref.collapse_shape
The verifier computed an illegal type with negative dimension size when collapsing partially static memrefs.

Differential Revision: https://reviews.llvm.org/D114702
2021-11-29 15:47:12 +01:00
Nicolas Vasilache f5a9bfdf8f [mlir] NFC - Move invalid.mlir tests to the proper dialects 2021-11-28 21:30:40 +00:00
Mats Petersson 30238c3676 [mlir][OpenMP] Add support for SIMD modifier
Add support for SIMD modifier in OpenMP worksharing loops.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D111051
2021-11-26 14:04:46 +00:00
Stanislav Funiak a76ee58f3c Multi-root PDL matching using upward traversals.
This is commit 4 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

This PR integrates the various components (root ordering algorithm, nondeterministic execution of PDL bytecode) to implement multi-root PDL matching. The main idea is for the pattern to specify mulitple candidate roots. The PDL-to-PDLInterp lowering selects one of these roots and "hangs" the pattern from this root, traversing the edges downwards (from operation to its operands) when possible and upwards (from values to its uses) when needed. The root is selected by invoking the optimal matching multiple times, once for each candidate root, and the connectors are determined form the optimal matching. The costs in the directed graph are equal to the number of upward edges that need to be traversed when connecting the given two candidate roots. It can be shown that, for this choice of the cost function, "hanging" the pattern an inner node is no better than from the optimal root.

The following three main additions were implemented as a part of this PR:
1. OperationPos predicate has been extended to allow tracing the operation accepting a value (the opposite of operation defining a value).
2. Predicate checking if two values are not equal - this is useful to ensure that we do not traverse the edge back downwards after we traversed it upwards.
3. Function for for building the cost graph among the candidate roots.
4. Updated buildPredicateList, building the predicates optimal branching has been determined.

Testing: unit tests (an integration test to follow once the stack of commits has landed)

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108550
2021-11-26 18:11:37 +05:30
Stanislav Funiak 842b6861c0 Defines new PDLInterp operations needed for multi-root matching in PDL.
This is commit 1 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

These operations are:
* pdl.get_accepting_ops: Returns a list of operations accepting the given value or a range of values at the specified position. Thus if there are two operations `%op1 = "foo"(%val)` and `%op2 = "bar"(%val)` accepting a value at position 0, `%ops = pdl_interp.get_accepting_ops of %val : !pdl.value at 0` will return both of them. This allows us to traverse upwards from a value to operations accepting the value.
* pdl.choose_op: Iteratively chooses one operation from a range of operations. Therefore, writing `%op = pdl_interp.choose_op from %ops` in the example above will select either `%op1`or `%op2`.

Testing: Added the corresponding test cases to mlir/test/Dialect/PDLInterp/ops.mlir.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108543
2021-11-26 17:59:22 +05:30
Tobias Gysi 8d07ba817c [mlir][linalg] Simplify the hoist padding tests.
Use primarily matvec instead of matmul to test hoist padding. Test the hoisting only starting from already padded IR. Use one-dimensional tiling only except for the tile_and_fuse test that exercises hoisting on a larger loop nest with fill and pad tensor operations in the backward slice.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114608
2021-11-26 07:40:22 +00:00
Matthias Springer c94b80b438 [mlir][linalg][bufferize][NFC] Allow returning arbitrary memrefs
If `allowReturnMemref` is set to true, arbitrary memrefs may be returned from FuncOps. Also remove allocation hoisting code, which is only partly implemented at the moment.

The purpose of this commit is to untangle `bufferize` from `aliasInfo`. (Even with this change, they are not fully untangled yet.)

Differential Revision: https://reviews.llvm.org/D114507
2021-11-26 11:26:46 +09:00
Alexander Belyaev 57470abc41 [mlir] Move memref.[tensor_load|buffer_cast|clone] to "bufferization" dialect.
https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712

Differential Revision: https://reviews.llvm.org/D114552
2021-11-25 11:50:39 +01:00
Tobias Gysi 43dc6d5d57 [mlir][linalg] Cleanup hoisting test (NFC).
Rename the check prefixes to HOIST21 and HOIST32 to clarify the different flag configurations.

Depends On D114438

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114442
2021-11-25 10:42:24 +00:00
Tobias Gysi 4b03906346 [mlir][linalg] Perform checks early in hoist padding.
Instead of checking for unexpected operations (any operation with a region except for scf::For and `padTensorOp` or operations with a memory effect) while cloning the packing loop nest perform the checks early. Update `dropNonIndexDependencies` to check for unexpected operations. Additionally, check all of these operations have index type operands only.

Depends On D114428

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114438
2021-11-25 10:37:12 +00:00
Tobias Gysi fd723eaa92 [mlir][linalg] Limit hoist padding to constant paddings.
Limit hoist padding to pad tensor ops that depend only on a constant value. Supporting arbitrary padding values that depend on computations part of the backward slice to hoist require complex analysis to ensure the computation can be hoisted.

Depends On D114420

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114428
2021-11-25 10:31:39 +00:00
Tobias Gysi ed7c1fb9b0 [mlir][linalg] Add backward slice filtering in hoist padding.
Adapt hoist padding to filter the backward slice before cloning the packing loop nest. The filtering removes all operations that are not used to index the hoisted pad tensor op and its extract slice op. The filtering is needed to support the more complex loop nests created after fusion. For example, fusing the producer of an output operand can added linalg ops and pad tensor ops to the backward slice. These operations have regions and currently prevent hoisting.

The following example demonstrates the effect of the newly introduced `dropNonIndexDependencies` method that filters the backward slice:
```
%source = linalg.fill(%cst, %arg0)
scf.for %i
  %unrelated = linalg.fill(%cst, %arg1)    // not used to index %source!
  scf.for %j (%arg2 = %unrelated)
    scf.for %k                             // not used to index %source!
      %ubi = affine.min #map(%i)
      %ubj = affine.min #map(%j)
      %slice = tensor.extract_slice %source [%i, %j] [%ubi, %ubj]
      %padded_slice = linalg.pad_tensor %slice
```
dropNonIndexDependencies(%padded_slice, %slice)
removes [scf.for %k, linalg.fill(%cst, %arg1)] from backwardSlice.

Depends On D114175

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114420
2021-11-25 10:30:10 +00:00
Alexander Belyaev 3c228573bc Revert "[mlir][SCF] Further simplify affine maps during `for-loop-canonicalization`"
This reverts commit ee1bf18672.

It breaks IREE lowering. Reverting the commit for now while we
investigate what's going on.
2021-11-25 10:54:52 +01:00
Matthias Springer ee1bf18672 [mlir][SCF] Further simplify affine maps during `for-loop-canonicalization`
* Implement `FlatAffineConstraints::getConstantBound(EQ)`.
* Inject a simpler constraint for loops that have at most 1 iteration.
* Taking into account constant EQ bounds of FlatAffineConstraint dims/symbols during canonicalization of the resulting affine map in `canonicalizeMinMaxOp`.

Differential Revision: https://reviews.llvm.org/D114138
2021-11-25 12:44:19 +09:00
Tobias Gysi b6e7b1be73 [mlir][linalg] Simplify padding test (NFC).
The padding tests previously contained the tile loops. This revision removes the tile loops since padding itself does not consider the loops. Instead the induction variables are passed in as function arguments which promotes them to symbols in the affine expressions. Note that the pad-and-hoist.mlir test still exercises padding in the context of the full loop nest.

Depends On D114175

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114227
2021-11-24 19:21:50 +00:00