Commit Graph

1699 Commits

Author SHA1 Message Date
Tobias Gysi bf28849745 [mlir][linalg] Retire PoolingMaxOp/PoolingMinOp/PoolingSumOp.
The pooling ops are among the last remaining hard coded Linalg operations that have no region attached. They got obsolete due to the OpDSL pooling operations. Removing them allows us to delete specialized code and tests that are not needed for the OpDSL counterparts that rely on the standard code paths.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110909
2021-10-01 13:51:56 +00:00
Uday Bondhugula 08b63db8bb [MLIR][GPU] Add GPU launch op support for dynamic shared memory
Add support for dynamic shared memory for GPU launch ops: add an
optional operand to gpu.launch and gpu.launch_func ops to specify the
amount of "dynamic" shared memory to use. Update lowerings to connect
this operand to the GPU runtime.

Differential Revision: https://reviews.llvm.org/D110800
2021-10-01 16:46:07 +05:30
Lei Zhang cb2e651800 [mlir][linalg] Fix incorrect bound calculation for tiling conv
For convolution, the input window dimension's access affine map
is of the form `(d0 * s0 + d1)`, where `d0`/`d1` is the output/
filter window dimension, and `s0` is the stride.

When tiling, https://reviews.llvm.org/D109267 changed how the
way dimensions are acquired. Instead of directly querying using
`*.dim` ops on the original convolution op, we now get it by
applying the access affine map to the loop upper bounds. This
is fine for dimensions having single-dimension affine maps,
like matmul, but not for convolution input. It will cause
incorrect compuation and out of bound. A concrete example, say
we have 1x225x225x3 (NHWC) input, 3x3x3x32 (HWCF) filter, and
1x112x112x3 (NHWC) output with stride 2, (112 * 2 + 3) would be
227, which is different from the correct input window dimension
size 225.

Instead, we should first calculate the max indices for each loop,
and apply the affine map to them, and then plus one to get the
dimension size. Note this makes no difference for matmul-like
ops given they will have `d0 - 1 + 1` effectively.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110849
2021-09-30 13:50:57 -04:00
Stella Laurenzo 267bb194f3 [mlir] Remove old "tc" linalg ods generator.
* This could have been removed some time ago as it only had one op left in it, which is redundant with the new approach.
* `matmul_i8_i8_i32` (the remaining op) can be trivially replaced by `matmul`, which natively supports mixed precision.

Differential Revision: https://reviews.llvm.org/D110792
2021-09-30 16:30:06 +00:00
Matthias Springer 27451a05ed [mlir][vector] Fold transfer ops and tensor.extract/insert_slice.
* Fold vector.transfer_read and tensor.extract_slice.
* Fold vector.transfer_write and tensor.insert_slice.

Differential Revision: https://reviews.llvm.org/D110627
2021-09-30 09:28:00 +09:00
Nicolas Vasilache 92ea624a13 [mlir][Linalg] Rewrite CodegenStrategy to populate a pass pipeline.
This revision retires a good portion of the complexity of the codegen strategy and puts the logic behind pass logic.

Differential revision: https://reviews.llvm.org/D110678
2021-09-29 13:35:45 +00:00
bakhtiyar bdde959533 Remove unnecessary async group creates and awaits.
Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D110605
2021-09-28 14:52:08 -07:00
Amy Zhuang 7ab14b8886 [mlir] Unroll-and-jam loops with iter_args.
Unroll-and-jam currently doesn't work when the loop being unroll-and-jammed
or any of its inner loops has iter_args. This patch modifies the
unroll-and-jam utility to support loops with iter_args.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D110085
2021-09-28 14:13:27 -07:00
thomasraoux b12e4c17e0 [mlir] Fix bug in FoldSubview with rank reducing subview
Fix how we calculate the new permutation map of the transfer ops.

Differential Revision: https://reviews.llvm.org/D110638
2021-09-28 13:18:29 -07:00
Alexander Belyaev 9fb57c8c1d [mlir] Add min/max operations to Standard.
[RFC: Add min/max ops](https://llvm.discourse.group/t/rfc-add-min-max-operations/4353)

I was following the naming style for Arith dialect in
https://reviews.llvm.org/D110200,
i.e. similar to DivSIOp and DivUIOp I defined MaxSIOp, MaxUIOp.

When Arith PR is landed, I will migrate these ops as well.

Differential Revision: https://reviews.llvm.org/D110540
2021-09-28 09:40:22 +02:00
Tobias Gysi d20d0e145d [mlir][linalg] Finer-grained padding control.
Adapt the signature of the PaddingValueComputationFunction callback to either return the padding value or failure to signal padding is not desired.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110572
2021-09-27 19:21:37 +00:00
Aart Bik ec97a205c3 [mlir][sparse] preserve zero-initialization for materializing buffers
This revision makes sure that when the output buffer materializes locally
(in contrast with the passing in of output tensors either in-place or not
in-place), the zero initialization assumption is preserved. This also adds
a bit more documentation on our sparse kernel assumption (viz. TACO
assumptions).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D110442
2021-09-27 11:22:05 -07:00
Bixia Zheng fbd5821c6f Implement the conversion from sparse constant to sparse tensors.
The sparse constant provides a constant tensor in coordinate format. We first split the sparse constant into a constant tensor for indices and a constant tensor for values. We then generate a loop to fill a sparse tensor in coordinate format using the tensors for the indices and the values. Finally, we convert the sparse tensor in coordinate format to the destination sparse tensor format.

Add tests.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D110373
2021-09-27 09:47:29 -07:00
Eugene Zhulenev 92db09cde0 [mlir] AsyncRuntime: use int64_t for ref counting operations
Workaround for SystemZ ABI problem: https://bugs.llvm.org/show_bug.cgi?id=51898

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D110550
2021-09-27 07:55:01 -07:00
Matthias Springer ffdf0a370d [mlir][vector] Fix bug in vector-transfer-full-partial-split
When splitting with linalg.copy, cannot write into the destination alloc directly. Instead, write into a subview of the alloc.

Differential Revision: https://reviews.llvm.org/D110512
2021-09-27 18:12:17 +09:00
Lei Zhang b45476c94c [mlir][tosa] Do not fold transpose with quantized types
For such cases, the type of the constant DenseElementsAttr is
different from the transpose op return type.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D110446
2021-09-24 16:57:55 -04:00
River Riddle aca9bea199 [mlir:MemRef] Move DmaStartOp/DmaWaitOp to ODS
These are among the last operations still defined explicitly in C++. I've
tried to keep this commit as NFC as possible, but these ops
definitely need a non-NFC cleanup at some point.

Differential Revision: https://reviews.llvm.org/D110440
2021-09-24 19:35:28 +00:00
Lei Zhang e325ebb9c7 [mlir][tosa] Add some transpose folders
* If the input is a constant splat value, we just
  need to reshape it.
* If the input is a general constant with one user,
  we can also constant fold it, without bloating
  the IR.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D110439
2021-09-24 15:25:14 -04:00
Alex Zinenko 5988a3b7a0 [mlir] Linalg: ensure tile-and-pad always creates padding as requested
Initially, the padding transformation and the related operation were only used
to guarantee static shapes of subtensors in tiled operations. The
transformation would not insert the padding operation if the shapes were
already static, and the overall code generation would actively remove such
"noop" pads. However, this transformation can be also used to pack data into
smaller tensors and marshall them into faster memory, regardless of the size
mismatches. In context of expert-driven transformation, we should assume that,
if padding is requested, a potentially padded tensor must be always created.
Update the transformation accordingly. To do this, introduce an optional
`packing` attribute to the `pad_tensor` op that serves as an indication that
the padding is an intentional choice (as opposed to side effect of type
normalization) and should be left alone by cleanups.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110425
2021-09-24 18:40:13 +02:00
Alex Zinenko 3f89e339bb [mlir] add pad_tensor(tensor.cast) -> pad_tensor canonicalizer
This canonicalization pattern complements the tensor.cast(pad_tensor) one in
propagating constant type information when possible. It contributes to the
feasibility of pad hoisting.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110343
2021-09-24 12:03:47 +02:00
Matthias Springer f3f25ffc04 [mlir][linalg] Fix result type in FoldSourceTensorCast
* Do not discard static result type information that cannot be inferred from lower/upper padding.
* Add optional argument to `PadTensorOp::inferResultType` for specifying known result dimensions.

Differential Revision: https://reviews.llvm.org/D110380
2021-09-24 16:47:18 +09:00
Matthias Springer 2190f8a8b1 [mlir][linalg] Support tile+peel with TiledLoopOp
Only scf.for was supported until now.

Differential Revision: https://reviews.llvm.org/D110220
2021-09-24 10:23:31 +09:00
Matthias Springer 8dc16ba8d2 [mlir][linalg] Merge all tiling passes into a single one.
Passes such as `linalg-tile-to-tiled-loop` are merged into `linalg-tile`.

Differential Revision: https://reviews.llvm.org/D110214
2021-09-24 10:16:46 +09:00
Aart Bik a924fcc7c3 [mlir][sparse] add sparse kernels test to sparse compiler test suite
This test makes sure kernels map to efficient sparse code, i.e. all
compressed for-loops, no co-iterating while loops.  In addition, this
revision removes the special constant folding inside the sparse
compiler in favor of Mahesh' new generic linalg folding. Thanks!

NOTE: relies on Mahesh fix, which needs to be rebased first

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D110001
2021-09-22 14:56:39 -07:00
MaheshRavishankar a40a08ed98 [mlir][Linalg] Teach constant -> generic op fusion to handle scalar constants.
The current folder of constant -> generic op only handles splat
constants. The same logic holds for scalar constants. Teach the
pattern to handle such cases.

Differential Revision: https://reviews.llvm.org/D109982
2021-09-22 13:41:47 -07:00
Aart Bik 5da21338bc [mlir][sparse] generalize reduction support in sparse compiler
Now not just SUM, but also PRODUCT, AND, OR, XOR. The reductions
MIN and MAX are still to be done (also depends on recognizing
these operations in cmp-select constructs).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D110203
2021-09-22 12:36:46 -07:00
Tobias Gysi 8b5236def5 [mlir][linalg] Simplify slice dim computation for fusion on tensors (NFC).
Compute the tiled producer slice dimensions directly starting from the consumer not using the producer at all.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110147
2021-09-21 15:09:46 +00:00
Nicolas Vasilache 101d017a64 [mlir][Linalg] Revisit heuristic ordering of tensor.insert_slice in comprehensive bufferize.
It was previously assumed that tensor.insert_slice should be bufferized first in a greedy fashion to avoid out-of-place bufferization of the large tensor. This heuristic does not hold upon further inspection.

This CL removes the special handling of such ops and adds a test that exhibits better behavior and appears in real use cases.

The only test adversely affected is an artificial test which results in a returned memref: this pattern is not allowed by comprehensive bufferization in real scenarios anyway and the offending test is deleted.

Differential Revision: https://reviews.llvm.org/D110072
2021-09-21 14:22:45 +00:00
Nicolas Vasilache 0d2c54e851 [mlir][Linalg] Revisit RAW dependence interference in comprehensive bufferize.
Previously, comprehensive bufferize would consider all aliasing reads and writes to
the result buffer and matching operand. This resulted in spurious dependences
being considered and resulted in too many unnecessary copies.

Instead, this revision revisits the gathering of read and write alias sets.
This results in fewer alloc and copies.
An exhaustive test cases is added that considers all possible permutations of
`matmul(extract_slice(fill), extract_slice(fill), ...)`.
2021-09-21 14:22:22 +00:00
Morten Borup Petersen 032cb1650f [MLIR][SCF] Add for-to-while loop transformation pass
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.

This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).

Differential Revision: https://reviews.llvm.org/D108454
2021-09-21 09:09:54 +01:00
River Riddle 4f21152af1 [mlir] Tighten verification of SparseElementsAttr
SparseElementsAttr currently does not perform any verfication on construction, with the only verification existing within the parser. This revision moves the parser verification to SparseElementsAttr, and also adds additional verification for when a sparse index is not valid.

Differential Revision: https://reviews.llvm.org/D109189
2021-09-21 01:57:42 +00:00
MaheshRavishankar 4cf9bf6c9f [mlir][MemRef] Compute unused dimensions of a rank-reducing subviews using strides as well.
For `memref.subview` operations, when there are more than one
unit-dimensions, the strides need to be used to figure out which of
the unit-dims are actually dropped.

Differential Revision: https://reviews.llvm.org/D109418
2021-09-20 11:05:30 -07:00
MaheshRavishankar 0b33890f45 [mlir][Linalg] Add ConvolutionOpInterface.
Add an interface that allows grouping together all covolution and
pooling ops within Linalg named ops. The interface currently
- the indexing map used for input/image access is valid
- the filter and output are accessed using projected permutations
- that all loops are charecterizable as one iterating over
  - batch dimension,
  - output image dimensions,
  - filter convolved dimensions,
  - output channel dimensions,
  - input channel dimensions,
  - depth multiplier (for depthwise convolutions)

Differential Revision: https://reviews.llvm.org/D109793
2021-09-20 10:41:10 -07:00
Mehdi Amini 5edd79fc97 Revert "[MLIR][SCF] Add for-to-while loop transformation pass"
This reverts commit 644b55d57e.

The added test is failing the bots.
2021-09-20 17:21:59 +00:00
Tobias Gysi 7be28d82b4 [mlir][linalg] Add IndexOp support to fusion on tensors.
This revision depends on https://reviews.llvm.org/D109761 and https://reviews.llvm.org/D109766.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D109774
2021-09-20 15:59:35 +00:00
Morten Borup Petersen 644b55d57e [MLIR][SCF] Add for-to-while loop transformation pass
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.

This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).

Differential Revision: https://reviews.llvm.org/D108454
2021-09-20 16:57:50 +01:00
Tobias Gysi 6db928b8f3 [mlir][linalg] Fusion on tensors.
Add a new version of fusion on tensors that supports the following scenarios:
- support input and output operand fusion
- fuse a producer result passed in via tile loop iteration arguments (update the tile loop iteration arguments)
- supports only linalg operations on tensors
- supports only scf::for
- cannot add an output to the tile loop nest

The LinalgTileAndFuseOnTensors pass tiles the root operation and fuses its producers.

Reviewed By: nicolasvasilache, mravishankar

Differential Revision: https://reviews.llvm.org/D109766
2021-09-20 14:45:34 +00:00
KareemErgawy-TomTom bdcf4b9b96 [MLIR][Linalg] Make detensoring cost-model more flexible.
So far, the CF cost-model for detensoring was limited to discovering
pure CF structures. This means, if while discovering the CF component,
the cost-model found any op that is not detensorable, it gives up on
detensoring altogether. This patch makes it a bit more flexible by
cleaning-up the detensorable component from non-detensorable ops without
giving up entirely.

Reviewed By: silvas

Differential Revision: https://reviews.llvm.org/D109965
2021-09-20 10:21:31 +02:00
Uday Bondhugula 57eda9becc [MLIR][GPU] Add constant propagator for gpu.launch op
Add a constant propagator for gpu.launch op in cases where the
grid/thread IDs can be trivially determined to take a single constant
value of zero.

Differential Revision: https://reviews.llvm.org/D109994
2021-09-18 12:02:46 +05:30
Aart Bik d4e16171e8 [mlir][sparse] add dce test for all sparse tensor ops
Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D109992
2021-09-17 13:03:42 -07:00
thomasraoux 08f0cb7719 [mlir] Prevent crash in DropUnitDim pattern due to tensor with encoding
Differential Revision: https://reviews.llvm.org/D109984
2021-09-17 12:03:16 -07:00
thomasraoux 36aac53b36 [mlir][linalg] Extend drop unit dim pattern to all cases of reduction
Even with all parallel loops reading the output value is still allowed so we
don't have to handle reduction loops differently.

Differential Revision: https://reviews.llvm.org/D109851
2021-09-17 10:09:57 -07:00
thomasraoux 416679615d [mlir] Linalg hoisting should ignore uses outside the loop
Differential Revision: https://reviews.llvm.org/D109859
2021-09-17 10:06:57 -07:00
Aart Bik b1d44e5902 [mlir][sparse] add affine subscripts to sparse compilation pass
This enables the sparsification of more kernels, such as convolutions
where there is a x(i+j) subscript. It also enables more tensor invariants
such as x(1) or other affine subscripts such as x(i+1). Currently, we
reject sparsity altogether for such tensors. Despite this restriction,
however, we can already handle a lot more kernels with compound subscripts
for dense access (viz. convolution with dense input and sparse filter).
Some unit tests and an integration test demonstrate new capability.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D109783
2021-09-15 20:28:04 -07:00
Rob Suderman 1ac2d195ec [mlir][linalg] Add canonicalizers for depthwise conv
There are two main versions of depthwise conv depending whether the multiplier
is 1 or not. In cases where m == 1 we should use the version without the
multiplier channel as it can perform greater optimization.

Add lowering for the quantized/float versions to have a multiplier of one.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D108959
2021-09-15 14:09:15 -07:00
Simon Camphausen 1b79efdc72 [mlir] Fix printing of EmitC attrs/types with escape characters
Attributes and types were not escaped when printing.

Reviewed By: jpienaar, marbre

Differential Revision: https://reviews.llvm.org/D109143
2021-09-15 18:15:38 +00:00
Nicolas Vasilache 96ec0ff2b7 [mlir][Linalg] Revisit insertion points in comprehensive bufferization.
This revision fixes a corner case that could appear due to incorrect insertion point behavior in comprehensive bufferization.

Differential Revision: https://reviews.llvm.org/D109830
2021-09-15 18:11:38 +00:00
Nicolas Vasilache 6fe77b1051 [mlir][Linalg] Fail comprehensive bufferization if a memref is returned.
Summary:

Reviewers:

Subscribers:

Differential revision: https://reviews.llvm.org/D109824
2021-09-15 15:11:17 +00:00
Tobias Gysi 44a889778c [mlir][linalg] Fold ExtractSliceOps during tiling.
Add the makeComposedExtractSliceOp method that creates an ExtractSliceOp and folds chains of ExtractSliceOps by computing the sum of their offsets and by multiplying their strides.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D109601
2021-09-14 11:43:52 +00:00
Matthias Springer 62883459cd [mlir][linalg] makeTiledShape: No affine.min if tile size == 1
This improves codegen (more static type information) with `scalarize-dynamic-dims`.

Differential Revision: https://reviews.llvm.org/D109415
2021-09-14 10:48:20 +09:00