After removing the range type, Linalg does not define any type. The revision thus consolidates the LinalgOps.h and LinalgTypes.h into a single Linalg.h header. Additionally, LinalgTypes.cpp is renamed to LinalgDialect.cpp to follow the convention adopted by other dialects such as the tensor dialect.
Depends On D115727
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115728
Break up the vectorization pre-condition into the part checking for
static shape and the rest checking if the linalg op is supported by
vectorization. This allows checking if an op could be vectorized if it
had static shapes.
Differential Revision: https://reviews.llvm.org/D115754
Tighten the matcher of the PadTensorOpVectorizationWithInsertSlicePattern pattern. Only match if the PadOp result is used by the InsertSliceOp source. Fail if the result is used by the InsertSliceOp dest.
Depends On D115336
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115359
This patch fixes the build by removing
extractVectorTypeFromShapedValue. The last use was removed Dec 1,
2021 in commit extractVectorTypeFromShapedValue.
This revision adds 0-d vector support to vector.transfer ops.
In the process, numerous cleanups are applied, in particular around normalizing
and reducing the number of builders.
Reviewed By: ThomasRaoux, springerm
Differential Revision: https://reviews.llvm.org/D114803
When trying to connect the vectorization of depthwise convolutions to e2e execution
a number of problems surfaced.
Fix an off-by-one error on the size of the input vector (similary to what was previously done for regular conv).
Rewrite the lowering to vector.fma instead of vector.contract: the KW reduction dimension has already been unrolled and vector.contract requires a reduction dimension to be valid.
Differential Revision: https://reviews.llvm.org/D113884
At this time the 2 flavors of conv are a little too different to allow significant code sharing and other will likely come up.
so we go the easy route first by duplicating and adapting.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D113758
This better decouples transfer read/write from vector-only rewrite of conv.
This form is close to ready to plop into a new vector.conv op and the vector.transfer operations to be generalized as part of generic vectorization once the properties ConvolutionOpInterface are inferred from the indexing maps.
This also results in a nice perf boost in the dw == 1 cases.
Differential revision: https://reviews.llvm.org/D112822
This refactoring prepares conv1d vectorization for a future integration into
the generic codegen path.
Once transfer_read / transfer_write vectorization also supports sliding windows,
the special pattern for conv can disappear.
This will also likely need a vector.conv operation.
Differential Revision: https://reviews.llvm.org/D112797
Handle contraction op like all the other generic op reductions. This
simpifies the code. We now rely on contractionOp canonicalization to
keep the same code quality.
Differential Revision: https://reviews.llvm.org/D112171
In the stride == 1 case, conv1d reads contiguous data along the input dimension. This can be advantageaously used to bulk memory transfers and compute while avoiding unrolling. Experimentally, this can yield speedups of up to 50%.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D112139
This revision uses the newly refactored StructuredGenerator to create a simple vectorization for conv1d_nwc_wcf.
Note that the pattern is not specific to the op and is technically not even specific to the ConvolutionOpInterface (modulo minor details related to dilations and strides).
The overall design follows the same ideas as the lowering of vector::ContractionOp -> vector::OuterProduct: it seeks to be minimally complex, composable and extensible while avoiding inference analysis. Instead, we metaprogram the maps/indexings we expect and we match against them.
This is just a first stab and still needs to be evaluated for performance.
Other tradeoffs are possible that should be explored.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111894
Emit reduction during op vectorization instead of doing it when creating the
transfer write. This allow us to not broadcast output arguments for reduction
initial value.
Differential Revision: https://reviews.llvm.org/D111825
After removing the last LinalgOps that have no region attached we can verify there is a region. The patch performs the following changes:
- Move the SingleBlockImplicitTerminator trait further up the the structured op base class.
- Adapt the LinalgOp verification since the trait only check if there is 0 or 1 block.
- Introduce a getBlock method on the LinalgOp interface.
- Access the LinalgOp body using either getBlock() or getBody() if the concrete operation type is known.
This patch is a follow up to https://reviews.llvm.org/D111233.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111393
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
We shouldn't broadcast the original value when doing reduction. Instead
we compute the reduction and then combine it with the original value.
Differential Revision: https://reviews.llvm.org/D111666
This patch teaches `isProjectedPermutation` and `inverseAndBroadcastProjectedPermutation`
utilities to deal with maps representing an explicit broadcast, e.g., (d0, d1) -> (d0, 0).
This extension is needed to enable vectorization of such explicit broadcast in Linalg.
Reviewed By: pifon2a, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111563
This revision takes advantage of the recently added support for 0-d transfers and vector.multi_reduction that return a scalar.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D111626
The convolution op is one of the remaining hard coded Linalg operations that have no region attached. It got obsolete due to the OpDSL convolution operations. Removing it allows us to delete specialized code and tests that are not needed for the OpDSL counterparts that rely on the standard code paths.
Test needed due to specialized implementations are removed. Tiling and fusion tests are replaced by variants using linalg.conv_2d.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111233
This patch extends Linalg core vectorization with support for min/max reductions
in linalg.generic ops. It enables the reduction detection for min/max combiner ops.
It also renames MIN/MAX combining kinds to MINS/MAXS to make the sign explicit for
floating point and signed integer types. MINU/MAXU should be introduce din the future
for unsigned integer types.
Reviewed By: pifon2a, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D110854
This patch introduces a generic reduction detection utility that works
across different dialecs. It is mostly a generalization of the reduction
detection algorithm in Affine. The reduction detection logic in Affine,
Linalg and SCFToOpenMP have been replaced with this new generic utility.
The utility takes some basic components of the potential reduction and
returns: 1) the reduced value, and 2) a list with the combiner operations.
The logic to match reductions involving multiple combiner operations disabled
until we can properly test it.
Reviewed By: ftynse, bondhugula, nicolasvasilache, pifon2a
Differential Revision: https://reviews.llvm.org/D110303
Multiple operations were still defined as TC ops that had equivalent versions
as YAML operations. Reducing to a single compilation path guarantees that
frontends can lower to their equivalent operations without missing the
optimized fastpath.
Some operations are maintained purely for testing purposes (mainly conv{1,2,3}D
as they are included as sole tests in the vectorizaiton transforms.
Differential Revision: https://reviews.llvm.org/D108169
These operations are not lowered to from any source dialect and are only
used for redundant tests. Removing these named ops, along with their
associated tests, will make migration to YAML operations much more
convenient.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D107993
When the output indexing map has a permutation we need to consider in
the contraction vector type.
Differential Revision: https://reviews.llvm.org/D106469
Insert ops replacing pad_tensor in front of the associated tansfer_write / insert_slice op. Otherwise we may end up with invalid ir if one of the remaining tansfer_write / insert_slice operands is defined after the pad_tensor op.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D106162
Refactor the original code to rewrite a PadTensorOp into a
sequence of InitTensorOp, FillOp and InsertSliceOp without
vectorization by default. `GenericPadTensorOpVectorizationPattern`
provides a customized OptimizeCopyFn to vectorize the
copying step.
Reviewed By: silvas, nicolasvasilache, springerm
Differential Revision: https://reviews.llvm.org/D105293
* Split memref.dim into two operations: memref.dim and tensor.dim. Both ops have the same builder interface and op argument names, so that they can be used with templates in patterns that apply to both tensors and memrefs (e.g., some patterns in Linalg).
* Add constant materializer to TensorDialect (needed for folding in affine.apply etc.).
* Remove some MemRefDialect dependencies, make some explicit.
Differential Revision: https://reviews.llvm.org/D105165
Reduce code duplication: Move various helper functions, that are duplicated in TensorDialect, MemRefDialect, LinalgDialect, StandardDialect, into a new StaticValueUtils.cpp.
Differential Revision: https://reviews.llvm.org/D104687
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename SubTensorOp -> tensor.extract_slice, SubTensorInsertOp -> tensor.insert_slice.
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Note: This is a fixed version of https://reviews.llvm.org/D104499, which was reverted due to a missing update to two CMakeFile.txt.
Differential Revision: https://reviews.llvm.org/D104676
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename ops: SubTensorOp --> ExtractTensorOp, SubTensorInsertOp --> InsertTensorOp
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Differential Revision: https://reviews.llvm.org/D104499
There's no need for `toSmallVector()` as `SmallVector.h` already provides a `to_vector` free function that takes a range.
Reviewed By: Quuxplusone
Differential Revision: https://reviews.llvm.org/D104024
Up to now all structured op operands are assumed to be shaped. The patch relaxes this assumption and allows scalar input operands. In contrast to shaped operands scalar operands are not indexed and directly forwarded to the body of the operation. As all other operands, scalar operands are associated to an indexing map that in case of a scalar or a 0D-operand has an empty range.
We will use scalar operands as a replacement for the capture mechanism. In contrast to captures, the approach ensures we can generate the function signature from the operand list and it prevents outdated capture values in case a transformation updates only the capture operand but not the hidden body of a named operation.
Removing captures and updating existing operations such as linalg.fill is left for a later patch.
The patch depends on https://reviews.llvm.org/D103891 and https://reviews.llvm.org/D103890.
Differential Revision: https://reviews.llvm.org/D104109
The padding of such ops is not generated in a vectorized way. Instead, emit a tensor::GenerateOp.
We may vectorize GenerateOps in the future.
Differential Revision: https://reviews.llvm.org/D103879
If the source operand of a linalg.pad_op operation has static shape, vectorize the copying of the source.
Differential Revision: https://reviews.llvm.org/D103747
Currently limited to constant pad values. Any combination of dynamic/static tensor sizes and padding sizes is supported.
Differential Revision: https://reviews.llvm.org/D103679
The generic vectorization pattern handles only those cases, where
low and high padding is zero. This is already handled by a
canonicalization pattern.
Also add a new canonicalization test case to ensure that tensor cast ops
are properly inserted.
A more general vectorization pattern will be added in a subsequent commit.
Differential Revision: https://reviews.llvm.org/D103590
Vectorize linalg.pad_tensor without generating a linalg.init_tensor when consumed by a transfer_write.
Differential Revision: https://reviews.llvm.org/D103137
Vectorize linalg.pad_tensor without generating a linalg.init_tensor when consumed by a subtensor_insert.
Differential Revision: https://reviews.llvm.org/D103780