Commit Graph

9341 Commits

Author SHA1 Message Date
Chia-hung Duan 2afd16fe72 [mlir] Enable MLIRDialectUtilsTests
Also remove `TooFewDims` test which tried to create an invalid AffineMap.
The creation of an invalid AffineMap is rejected by `willBeValidAffineMap`,
as a result we can deprecate the test.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D114657
2021-11-27 22:36:43 +00:00
Stanislav Funiak a19e163526 Fixed broken build under GCC 5.4.
This diff fixes broken build caused by D108550. Under GCC 5, auto lambdas that capture this require `this->` for member calls.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D114659
2021-11-27 09:03:27 +05:30
Kazu Hirata 803cec0268 [mlir] Fix a warning
This patch fixes:

  mlir/lib/IR/MLIRContext.cpp:1020:3: error: use of the 'nodiscard'
  attribute is a C++17 extension [-Werror,-Wc++17-extensions]
2021-11-26 12:27:11 -08:00
Arnab Dutta c2280b5517 [MLIR] Avoid creation of buggy affine maps when incorrect values of number of dimensions and number of symbols are provided.
We check whether the maximum index of dimensional identifier present
in the result expressions is less than dimCount (number of dimensional
identifiers) argument passed in the AffineMap::get() and the maximum index
of symbolic identifier present in the result expressions is less than
symbolCount (number of symbolic identifiers) argument passed in AffineMap::get().

Reviewed By: nicolasvasilache, bondhugula

Differential Revision: https://reviews.llvm.org/D114238
2021-11-27 00:37:08 +05:30
Arnab Dutta e4e4da86af [MLIR] Prevent creation of buggy affine map after linearizing collapsed dimensions of source map
Initially we were passing wrong numSymbols argument while calling
AffineMap::get() for creaating affine map with linearized result
expressions. The main problems was the number of symbols of the newly
to be created map may be different from that of the source map, as
new symbolic identifiers may be introduced while creating strided layout
linearized expressions.

Reviewed By: nicolasvasilache, bondhugula

Differential Revision: https://reviews.llvm.org/D114240
2021-11-27 00:32:58 +05:30
Chris Jones 344eee6f38 [MLIR] Allow `Idempotent` trait to be applied to binary ops.
Add `Idempotent` trait to `arith.{andi,ori}`.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D114574
2021-11-26 18:22:49 +00:00
Michal Terepeta 7e65fc9a60 [mlir][Vector] Support 0-D vectors in `BroadcastOp`
This changes the op to produce `AnyVectorOfAnyRank` following mostly the code for 1-D vectors.

Depends On D114598

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114550
2021-11-26 17:17:18 +00:00
Michal Terepeta d0f927121e [mlir][Standard] Support 0-D vectors in `SplatOp`
This changes the op to produce `AnyVectorOfAnyRank` and implements this by just
inserting the element (skipping the shuffle that we do for the 1-D case).

Depends On D114549

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114598
2021-11-26 17:05:15 +00:00
Arjun P ad34ce94d5 [MLIR] Simplex: fix a bug when rolling back a Simplex with no solutions
Previously, when adding a constraint to a Simplex that is already marked
as having no solutions (marked empty), the Simplex would be marked empty again,
and a second UnmarkEmpty entry would be pushed to the undo log. When rolling
back, Simplex should be unmarked empty only after rolling back past the
creation of the first constraint that made it empty.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D114613
2021-11-26 22:33:48 +05:30
Arjun P f074bbb04a [MLIR] Simplex::pivot: also update the redundant rows when pivoting
Previously, the pivot function would only update the non-redundant rows when
pivoting. This is incorrect because in some cases, when rolling back past a
`detectRedundant` call, the basis being used could be different from that which
was used at the time of returning from the `detectRedundant` call. Therefore,
it is important to update the redundant rows as well during pivots. This could
also be triggered by pivots that occur when testing successive constraints for
being redundant in `detectRedundant` after some initial constraints are marked redundant.

Reviewed By: Groverkss

Differential Revision: https://reviews.llvm.org/D114614
2021-11-26 21:42:41 +05:30
Mats Petersson 30238c3676 [mlir][OpenMP] Add support for SIMD modifier
Add support for SIMD modifier in OpenMP worksharing loops.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D111051
2021-11-26 14:04:46 +00:00
Matthias Springer b62b21b980 [mlir][linalg][bufferize][NFC] InsertSliceOp no-copy detection as PostAnalysis
There is special logic for InsertSliceOp to check if a memcpy is needed. This change extracts that piece of code and makes it a PostAnalysisStep.

The purpose of this change is to untangle `bufferize` from BufferizationAliasInfo. (Not fully there yet.)

Differential Revision: https://reviews.llvm.org/D114513
2021-11-26 22:19:29 +09:00
Benjamin Kramer 8521850f20 Provide a definition for OperationPosition::kDown
This isn't necessary in C++17, but C++14 still requires it.
2021-11-26 14:11:59 +01:00
Benjamin Kramer 1b0312d280 [PDL] fix unused variable warning in Release builds 2021-11-26 14:11:58 +01:00
Stanislav Funiak d35f119094 Added line numbers to the debug output of PDL bytecode.
This is a small diff that splits out the debug output for PDL bytecode. When running bytecode with debug output on, it is useful to know the line numbers where the PDLIntepr operations are performed. Usually, these are in a single MLIR file, so it's sufficient to print out the line number rather than the entire location (which tends to be quite verbose). This debug output is gated by `LLVM_DEBUG` rather than `#ifndef NDEBUG` to make it easier to test.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D114061
2021-11-26 18:11:37 +05:30
Stanislav Funiak a76ee58f3c Multi-root PDL matching using upward traversals.
This is commit 4 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

This PR integrates the various components (root ordering algorithm, nondeterministic execution of PDL bytecode) to implement multi-root PDL matching. The main idea is for the pattern to specify mulitple candidate roots. The PDL-to-PDLInterp lowering selects one of these roots and "hangs" the pattern from this root, traversing the edges downwards (from operation to its operands) when possible and upwards (from values to its uses) when needed. The root is selected by invoking the optimal matching multiple times, once for each candidate root, and the connectors are determined form the optimal matching. The costs in the directed graph are equal to the number of upward edges that need to be traversed when connecting the given two candidate roots. It can be shown that, for this choice of the cost function, "hanging" the pattern an inner node is no better than from the optimal root.

The following three main additions were implemented as a part of this PR:
1. OperationPos predicate has been extended to allow tracing the operation accepting a value (the opposite of operation defining a value).
2. Predicate checking if two values are not equal - this is useful to ensure that we do not traverse the edge back downwards after we traversed it upwards.
3. Function for for building the cost graph among the candidate roots.
4. Updated buildPredicateList, building the predicates optimal branching has been determined.

Testing: unit tests (an integration test to follow once the stack of commits has landed)

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108550
2021-11-26 18:11:37 +05:30
Stanislav Funiak 6df7cc7f47 Implementation of the root ordering algorithm
This is commit 3 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

We form a graph over the specified roots, provided in `pdl.rewrite`, where two roots are connected by a directed edge if the target root can be connected (via a chain of operations) in the underlying pattern to the source root. We place a restriction that the path connecting the two candidate roots must only contain the nodes in the subgraphs underneath these two roots. The cost of an edge is the smallest number of upward traversals (edges) required to go from the source to the target root, and the connector is a `Value` in the intersection of the two subtrees rooted at the source and target root that results in that smallest number of such upward traversals. Optimal root ordering is then formulated as the problem of finding a spanning arborescence (i.e., a directed spanning tree) of minimal weight.

In order to determine the spanning arborescence (directed spanning tree) of minimum weight, we use the [Edmonds' algorithm](https://en.wikipedia.org/wiki/Edmonds%27_algorithm). The worst-case computational complexity of this algorithm is O(_N_^3) for a single root, where _N_ is the number of specified roots. The `pdl`-to-`pdl_interp` lowering calls this algorithm as a subroutine _N_ times (once for each candidate root), so the overall complexity of root ordering is O(_N_^4). If needed, this complexity could be reduced to O(_N_^3) with a more efficient algorithm. However, note that the underlying implementation is very efficient, and _N_ in our instances tends to be very small (<10). Therefore, we believe that the proposed (asymptotically suboptimal) implementation will suffice for now.

Testing: a unit test of the algorithm

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108549
2021-11-26 18:11:37 +05:30
Stanislav Funiak 3eb1647af0 Introduced iterative bytecode execution.
This is commit 2 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

This commit implements the features needed for the execution of the new operations pdl_interp.get_accepting_ops, pdl_interp.choose_op:
1. The implementation of the generation and execution of the two ops.
2. The addition of Stack of bytecode positions within the ByteCodeExecutor. This is needed because in pdl_interp.choose_op, we iterate over the values returned by pdl_interp.get_accepting_ops until we reach finalize. When we reach finalize, we need to return back to the position marked in the stack.
3. The functionality to extend the lifetime of values that cross the nondeterministic choice. The existing bytecode generator allocates the values to memory positions by representing the liveness of values as a collection of disjoint intervals over the matcher positions. This is akin to register allocation, and substantially reduces the footprint of the bytecode executor. However, because with iterative operation pdl_interp.choose_op, execution "returns" back, so any values whose original liveness cross the nondeterminstic choice must have their lifetime executed until finalize.

Testing: pdl-bytecode.mlir test

Reviewed By: rriddle, Mogball

Differential Revision: https://reviews.llvm.org/D108547
2021-11-26 18:11:37 +05:30
Stanislav Funiak 842b6861c0 Defines new PDLInterp operations needed for multi-root matching in PDL.
This is commit 1 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

These operations are:
* pdl.get_accepting_ops: Returns a list of operations accepting the given value or a range of values at the specified position. Thus if there are two operations `%op1 = "foo"(%val)` and `%op2 = "bar"(%val)` accepting a value at position 0, `%ops = pdl_interp.get_accepting_ops of %val : !pdl.value at 0` will return both of them. This allows us to traverse upwards from a value to operations accepting the value.
* pdl.choose_op: Iteratively chooses one operation from a range of operations. Therefore, writing `%op = pdl_interp.choose_op from %ops` in the example above will select either `%op1`or `%op2`.

Testing: Added the corresponding test cases to mlir/test/Dialect/PDLInterp/ops.mlir.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108543
2021-11-26 17:59:22 +05:30
Tobias Gysi 8d07ba817c [mlir][linalg] Simplify the hoist padding tests.
Use primarily matvec instead of matmul to test hoist padding. Test the hoisting only starting from already padded IR. Use one-dimensional tiling only except for the tile_and_fuse test that exercises hoisting on a larger loop nest with fill and pad tensor operations in the backward slice.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114608
2021-11-26 07:40:22 +00:00
Michal Terepeta c47108c041 [mlir][Vector] Minor formatting fixes in Vector.md
Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D113854
2021-11-26 07:16:07 +00:00
Matthias Springer 8e2214aa60 [mlir][linalg][bufferize][NFC] Pass BufferizationState to PostAnalysisStep
Pass BufferizationStep instead of BufferizationAliasInfo. Note: BufferizationState contains BufferizationAliasInfo.

Differential Revision: https://reviews.llvm.org/D114512
2021-11-26 11:46:14 +09:00
Matthias Springer d62b4b08af [mlir][linalg][bufferize] Compose dialect-specific bufferization state
Use composition instead of inheritance for storing dialect-specific bufferization state. This is in preparation of adding "tensor dialect"-specific bufferization state.

Differential Revision: https://reviews.llvm.org/D114508
2021-11-26 11:35:45 +09:00
Matthias Springer c94b80b438 [mlir][linalg][bufferize][NFC] Allow returning arbitrary memrefs
If `allowReturnMemref` is set to true, arbitrary memrefs may be returned from FuncOps. Also remove allocation hoisting code, which is only partly implemented at the moment.

The purpose of this commit is to untangle `bufferize` from `aliasInfo`. (Even with this change, they are not fully untangled yet.)

Differential Revision: https://reviews.llvm.org/D114507
2021-11-26 11:26:46 +09:00
Matthias Springer c637e3ea9e [mlir][linalg][bufferize][NFC] Extract func boundary bufferization
Bufferization of function boundaries is extracted from ComprehensiveBufferize into a separate file. This will become its own build target in the future.

Differential Revision: https://reviews.llvm.org/D114226
2021-11-26 10:25:36 +09:00
Matthias Springer f32c3d9528 [mlir][linalg][bufferize][NFC] Move Affine interface impl to new build target
This makes ComprehensiveBufferize entirely independent of the Affine dialect.

Differential Revision: https://reviews.llvm.org/D114222
2021-11-26 09:27:47 +09:00
Mehdi Amini 850e8b4504 Fix link to the other docs from the Bufferization dialect 2021-11-26 00:13:32 +00:00
Michal Terepeta cc311a155a [mlir][Vector] Support 0-D vectors in `VectorPrintOpConversion`
Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114549
2021-11-25 20:12:18 +00:00
Uday Bondhugula c89fc1eec3 [MLIR] NFC. Rename MLIR CAPI ExecutionEngine target for consistency
Rename MLIR CAPI ExecutionEngine target for consistency:
MLIRCEXECUTIONENGINE -> MLIRCAPIExecutionEngine in line with other
targets.

Differential Revision: https://reviews.llvm.org/D114596
2021-11-26 00:23:17 +05:30
Tres Popp 6eca1957ee Don't store nullptrs in mlir::FuncOp::getAll*Attrs' result
These methods for results and arguments would create an ArrayRef full
of nullptrs when there were no argument attributes. This is problematic
because this result could not be passed to the FuncOp::build creator
without causing a segfault. Now the list will have empty attributes.

Differential Revision: https://reviews.llvm.org/D114358
2021-11-25 15:12:29 +01:00
seongwon bang 35c1e6ac1a [MLIR] [docs] Fix misguided examples in memref.subview operation.
The examples in `memref.subview` operation are misguided in that subview's strides operands mean "memref-rank number of strides that compose multiplicatively with the base memref strides in each dimension.".
So the below examples should be changed from `Strides: [64, 4, 1]` to `Strides: [1, 1, 1]`

Before changes
```
// Subview with constant offsets, sizes and strides.
%1 = memref.subview %0[0, 2, 0][4, 4, 4][64, 4, 1]
  : memref<8x16x4xf32, (d0, d1, d2) -> (d0 * 64 + d1 * 4 + d2)> to
    memref<4x4x4xf32, (d0, d1, d2) -> (d0 * 64 + d1 * 4 + d2 + 8)>
```

After changes
```
// Subview with constant offsets, sizes and strides.
%1 = memref.subview %0[0, 2, 0][4, 4, 4][1, 1, 1]
  : memref<8x16x4xf32, (d0, d1, d2) -> (d0 * 64 + d1 * 4 + d2)> to
    memref<4x4x4xf32, (d0, d1, d2) -> (d0 * 64 + d1 * 4 + d2 + 8)>
```

Also I fixed some syntax issues in docs related with memref layout map and added detailed explanation in subview rank reducing case.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D114500
2021-11-25 21:24:10 +09:00
Alexander Belyaev 57470abc41 [mlir] Move memref.[tensor_load|buffer_cast|clone] to "bufferization" dialect.
https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712

Differential Revision: https://reviews.llvm.org/D114552
2021-11-25 11:50:39 +01:00
Tobias Gysi 43dc6d5d57 [mlir][linalg] Cleanup hoisting test (NFC).
Rename the check prefixes to HOIST21 and HOIST32 to clarify the different flag configurations.

Depends On D114438

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114442
2021-11-25 10:42:24 +00:00
Tobias Gysi 4b03906346 [mlir][linalg] Perform checks early in hoist padding.
Instead of checking for unexpected operations (any operation with a region except for scf::For and `padTensorOp` or operations with a memory effect) while cloning the packing loop nest perform the checks early. Update `dropNonIndexDependencies` to check for unexpected operations. Additionally, check all of these operations have index type operands only.

Depends On D114428

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114438
2021-11-25 10:37:12 +00:00
Tobias Gysi fd723eaa92 [mlir][linalg] Limit hoist padding to constant paddings.
Limit hoist padding to pad tensor ops that depend only on a constant value. Supporting arbitrary padding values that depend on computations part of the backward slice to hoist require complex analysis to ensure the computation can be hoisted.

Depends On D114420

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114428
2021-11-25 10:31:39 +00:00
Tobias Gysi ed7c1fb9b0 [mlir][linalg] Add backward slice filtering in hoist padding.
Adapt hoist padding to filter the backward slice before cloning the packing loop nest. The filtering removes all operations that are not used to index the hoisted pad tensor op and its extract slice op. The filtering is needed to support the more complex loop nests created after fusion. For example, fusing the producer of an output operand can added linalg ops and pad tensor ops to the backward slice. These operations have regions and currently prevent hoisting.

The following example demonstrates the effect of the newly introduced `dropNonIndexDependencies` method that filters the backward slice:
```
%source = linalg.fill(%cst, %arg0)
scf.for %i
  %unrelated = linalg.fill(%cst, %arg1)    // not used to index %source!
  scf.for %j (%arg2 = %unrelated)
    scf.for %k                             // not used to index %source!
      %ubi = affine.min #map(%i)
      %ubj = affine.min #map(%j)
      %slice = tensor.extract_slice %source [%i, %j] [%ubi, %ubj]
      %padded_slice = linalg.pad_tensor %slice
```
dropNonIndexDependencies(%padded_slice, %slice)
removes [scf.for %k, linalg.fill(%cst, %arg1)] from backwardSlice.

Depends On D114175

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114420
2021-11-25 10:30:10 +00:00
Matthias Springer 48107eaa07 [mlir][linalg][bufferize][NFC] Move SCF interface impl to new build target
This makes ComprehensiveBufferize entirely independent of the SCF dialect.

Differential Revision: https://reviews.llvm.org/D114221
2021-11-25 19:00:17 +09:00
Alexander Belyaev 3c228573bc Revert "[mlir][SCF] Further simplify affine maps during `for-loop-canonicalization`"
This reverts commit ee1bf18672.

It breaks IREE lowering. Reverting the commit for now while we
investigate what's going on.
2021-11-25 10:54:52 +01:00
Butygin 8dae0b6b6c [mlir][spirv] arith::RemSIOp OpenCL lowering
Differential Revision: https://reviews.llvm.org/D114524
2021-11-25 12:44:06 +03:00
Butygin 467acf3b6b [mlir][spirv] Float atomics should not imply Shader
Differential Revision: https://reviews.llvm.org/D114551
2021-11-25 12:07:28 +03:00
Matthias Springer a5c2f78287 [mlir][interfaces] Add insideMutuallyExclusiveRegions helper
Add a helper function to ControlFlowInterfaces for checking if two ops
are in mutually exclusive regions according to RegionBranchOpInterface.

Utilize this new helper in Linalg ComprehensiveBufferize. This makes the
analysis independent of the SCF dialect and generalizes it to other ops
that implement RegionBranchOpInterface.

Differential Revision: https://reviews.llvm.org/D114220
2021-11-25 17:44:39 +09:00
Uday Bondhugula 25d173499e [MLIR] Rename test/python/dialects/math.py -> math_dialect.py
Rename test/python/dialects/math.py -> math_dialect.py to avoid a
collision with a Python standard package of the same name. These test
scripts are run by path and are not part of a package. Python apparently
implicitly adds the containing directory to its PYTHONPATH. As such,
test scripts with common names run the risk of conflicting with global
names and resolution of an import for the latter happens to the former.

Differential Revision: https://reviews.llvm.org/D114568
2021-11-25 09:51:49 +05:30
Matthias Springer ee1bf18672 [mlir][SCF] Further simplify affine maps during `for-loop-canonicalization`
* Implement `FlatAffineConstraints::getConstantBound(EQ)`.
* Inject a simpler constraint for loops that have at most 1 iteration.
* Taking into account constant EQ bounds of FlatAffineConstraint dims/symbols during canonicalization of the resulting affine map in `canonicalizeMinMaxOp`.

Differential Revision: https://reviews.llvm.org/D114138
2021-11-25 12:44:19 +09:00
Matthias Springer 8a8c655fe7 [mlir][SCF] Fix off-by-one bug in affine analysis
This change is NFC. There were two issues when passing/reading upper bounds into/from FlatAffineConstraints that negate each other, so the bug was not apparent. However, it made debugging harder because some constraints in the FlatAffineConstraints were off by one when dumping all constraints.

Differential Revision: https://reviews.llvm.org/D114137
2021-11-25 12:37:02 +09:00
Uday Bondhugula 23d505571d [NFC] Improve debug message in getAsIntegerSet
Improve debug message in getAsIntegerSet. Add missing trailing new line
and position info.

Differential Revision: https://reviews.llvm.org/D114511
2021-11-25 08:50:21 +05:30
Matthias Springer d3bb4fec2a [mlir][linalg][bufferize][NFC] Move arith interface impl to new build target
This makes ComprehensiveBufferize entirely independent of the arith dialect.

Differential Revision: https://reviews.llvm.org/D114219
2021-11-25 10:21:02 +09:00
bakhtiyar 7bd87a03fd Promote readability by factoring out creation of min/max operation. Remove unnecessary divisions.
Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D110680
2021-11-24 16:17:23 -08:00
Lei Zhang cb395f66ac [mlir][spirv] Change the return type for {Min|Max}VersionBase
For synthesizing an op's implementation of the generated interface
from {Min|Max}Version, we need to define an `initializer` and
`mergeAction`. The `initializer` specifies the initial version,
and `mergeAction` specifies how version specifications from
different parts of the op should be merged to generate a final
version requirements.

Previously we use the specified version enum as the type for both
the initializer and thus the final return type. This means we need
to perform `static_cast` over some hopefully not used number (`~0u`)
as the initializer. This is quite opaque and sort of not guaranteed
to work. Also, there are ops that have an enum attribute where some
values declare version requirements (e.g., enumerant `B` requires
v1.1+) but some not (e.g., enumerant `A` requires nothing). Then a
concrete op instance with `A` will still declare it implements the
version interface (because interface implementation is static for
an op) but actually theirs no requirements for version.

So this commit changes to use an more explicit `llvm::Optional`
to wrap around the returned version enum.  This should make it
more clear.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D108312
2021-11-24 17:33:01 -05:00
Tobias Gysi b6e7b1be73 [mlir][linalg] Simplify padding test (NFC).
The padding tests previously contained the tile loops. This revision removes the tile loops since padding itself does not consider the loops. Instead the induction variables are passed in as function arguments which promotes them to symbols in the affine expressions. Note that the pad-and-hoist.mlir test still exercises padding in the context of the full loop nest.

Depends On D114175

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D114227
2021-11-24 19:21:50 +00:00
Tobias Gysi 86f186efea [mlir][linalg] Add makeComposedPadHighOp.
Add the makeComposedPadHighOp method which creates a new PadTensorOp if necessary. If the source to pad is actually the result of a sequence of padded LinalgOps, the method checks if padding is needed or if we can use the padded result of the padded LinalgOp sequence directly.

Example:
```
%0 = tensor.extract_slice %arg0 [%iv0, %iv1] [%sz0, %sz1]
%1 = linalg.pad_tensor %0 low[0, 0] high[...] { linalg.yield %cst }
%2 = linalg.matmul ins(...) outs(%1)
%3 = tensor.extract_slice %2 [0, 0] [%sz0, %sz1]
```
when padding %3 return %2 instead of introducing
```
%4 = linalg.pad_tensor %3 low[0, 0] high[...] { linalg.yield %cst }
```

Depends On D114161

Reviewed By: nicolasvasilache, pifon2a

Differential Revision: https://reviews.llvm.org/D114175
2021-11-24 19:18:59 +00:00