Cleanup SuperVectorization dialect printing and parsing.

On the read side,
```
%3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: (d0, d1, d2)->(d2, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
```

becomes:

```
%3 = vector_transfer_read %arg0[%i2, %i1, %i0] {permutation_map: (d0, d1, d2)->(d2, d0)} : memref<?x?x?xf32>, vector<32x256xf32>
```

On the write side,

```
vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
```

becomes

```
vector_transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
```

Documentation will be cleaned up in a followup commit that also extracts a proper .md from the top of the file comments.

PiperOrigin-RevId: 241021879
This commit is contained in:
Nicolas Vasilache 2019-03-29 11:48:20 -07:00 committed by jpienaar
parent a38792f7d1
commit c9d5f3418a
32 changed files with 467 additions and 523 deletions

View File

@ -11,7 +11,7 @@ identifiers to enable powerful analysis and transformation. A symbolic
identifier can be bound to an SSA value that is either an argument to the
function, a value defined at the top level of that function (outside of all
loops and if instructions), the result of a
[`constant` operation](LangRef.md#'constant'-operation), or the result of an
[`constant` operation](../LangRef.md#'constant'-operation), or the result of an
[`affine.apply` operation](#'affine.apply'-operation) that recursively takes as
arguments any symbolic identifiers. Dimensions may be bound not only to anything
that a symbol is bound to, but also to induction variables of enclosing
@ -30,7 +30,7 @@ operation ::= ssa-id `=` `affine.apply` affine-map dim-and-symbol-use-list
```
The `affine.apply` instruction applies an
[affine mapping](LangRef.md#affine-expressions) to a list of SSA values,
[affine mapping](../LangRef.md#affine-expressions) to a list of SSA values,
yielding a single SSA value. The number of dimension and symbol arguments to
affine.apply must be equal to the respective number of dimensional and symbolic
inputs to the affine mapping; the `affine.apply` instruction always returns one
@ -64,7 +64,7 @@ The `affine.for` operation represents an affine loop nest. It has one region
containing its body. This region must contain one block that terminates with
[`affine.terminator`](#'affine.terminator"-operation). *Note:* when `affine.for`
is printed in custom format, the terminator is omitted. The block has one
argument of [`index`](LangRef.md#index-type) type that represents the induction
argument of [`index`](../LangRef.md#index-type) type that represents the induction
variable of the loop.
The `affine.for` operation executes its body a number of times iterating from a
@ -122,7 +122,7 @@ space defined by an integer set (a conjunction of affine constraints). A single
`affine.if` may end with an optional `else` clause.
The condition of the `affine.if` is represented by an
[integer set](LangRef.md#integer-sets) (a conjunction of affine constraints),
[integer set](../LangRef.md#integer-sets) (a conjunction of affine constraints),
and the SSA values bound to the dimensions and symbols in the integer set. The
[same restrictions](#restrictions-on-dimensions-and-symbols) hold for these SSA
values as for all bindings of SSA values to dimensions and symbols.
@ -167,7 +167,7 @@ loops ([`for`](#'for'-operation)) and branches ([`if`](#'if'-operation)). It
unconditionally transmits the control flow to the successor of the operation
enclosing the region.
*Rationale*: bodies of affine operations are [blocks](LangRef.md#block) that
*Rationale*: bodies of affine operations are [blocks](../LangRef.md#block) that
must have terminators. Loops and branches represent structured control flow and
should not accept arbitrary branches as terminators.

View File

@ -1,4 +1,4 @@
# SuperVector Dialect
# Vector Dialect
This dialect provides mid-level abstraction for the MLIR super-vectorizer.
@ -8,12 +8,12 @@ This dialect provides mid-level abstraction for the MLIR super-vectorizer.
### Vector transfers {#vector-transfers}
#### `vector_transfer_read` operation {#'vector_transfer_read'-operation}
#### `vector.transfer_read` operation {#'vector.transfer_read'-operation}
Syntax:
``` {.ebnf}
operation ::= ssa-id `=` `vector_transfer_read` ssa-use-list `{` attribute-entry `} :` function-type
operation ::= ssa-id `=` `vector.transfer_read` ssa-use-list `{` attribute-entry `} :` function-type
```
Examples:
@ -25,9 +25,9 @@ Examples:
for %i0 = 0 to %0 {
affine.for %i1 = 0 to %1 step 256 {
affine.for %i2 = 0 to %2 step 32 {
%v = vector_transfer_read %A, %i0, %i1, %i2, %f0
%v = vector.transfer_read %A[%i0, %i1, %i2], (%f0)
{permutation_map: (d0, d1, d2) -> (d2, d1)} :
(memref<?x?x?xf32>, index, index, f32) -> vector<32x256xf32>
memref<?x?x?xf32>, vector<32x256xf32>
}}}
// Read the slice `%A[%i0, %i1]` (i.e. the element `%A[%i0, %i1]`) into
@ -35,36 +35,36 @@ for %i0 = 0 to %0 {
// broadcast:
for %i0 = 0 to %0 {
affine.for %i1 = 0 to %1 {
%3 = vector_transfer_read %A, %i0, %i1
%3 = vector.transfer_read %A[%i0, %i1]
{permutation_map: (d0, d1) -> (0)} :
(memref<?x?xf32>, index, index) -> vector<128xf32>
memref<?x?xf32>, vector<128xf32>
}
}
```
The `vector_transfer_read` performs a blocking read from a slice within a scalar
[MemRef](#memref-type) supplied as its first operand into a
[vector](#vector-type) of the same elemental type. The slice is further defined
by a full-rank index within the MemRef, supplied as the operands `2 .. 1 +
rank(memref)`. The permutation_map [attribute](#attributes) is an
[affine-map](#affine-maps) which specifies the transposition on the slice to
match the vector shape. The size of the slice is specified by the size of the
vector, given as the return type. Optionally, an `ssa-value` of the same
elemental type as the MemRef is provided as the last operand to specify padding
in the case of out-of-bounds accesses. Absence of the optional padding value
signifies the `vector_transfer_read` is statically guaranteed to remain within
the MemRef bounds. This operation is called 'read' by opposition to 'load'
because the super-vector granularity is generally not representable with a
single hardware register. A `vector_transfer_read` is thus a mid-level
The `vector.transfer_read` performs a blocking read from a slice within a scalar
[MemRef](../LangRef.md#memref-type) supplied as its first operand into a
[vector](../LangRef.md#vector-type) of the same elemental type. The slice is
further defined by a full-rank index within the MemRef, supplied as the operands
`2 .. 1 + rank(memref)`. The permutation_map [attribute](../LangRef.md#attributes)
is an [affine-map](../LangRef.md#affine-maps) which specifies the transposition on
the slice to match the vector shape. The size of the slice is specified by the
size of the vector, given as the return type. Optionally, an `ssa-value` of the
same elemental type as the MemRef is provided as the last operand to specify
padding in the case of out-of-bounds accesses. Absence of the optional padding
value signifies the `vector.transfer_read` is statically guaranteed to remain
within the MemRef bounds. This operation is called 'read' by opposition to
'load' because the super-vector granularity is generally not representable with
a single hardware register. A `vector.transfer_read` is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding for
full-tile-only code.
More precisely, let's dive deeper into the permutation_map for the following :
```mlir {.mlir}
vector_transfer_read %A, %expr1, %expr2, %expr3, %expr4
vector.transfer_read %A[%expr1, %expr2, %expr3, %expr4]
{ permutation_map : (d0,d1,d2,d3) -> (d2,0,d0) } :
(memref<?x?x?x?xf32>, index, index, index, index) -> vector<3x4x5xf32>
memref<?x?x?x?xf32>, vector<3x4x5xf32>
```
This operation always reads a slice starting at `%A[%expr1, %expr2, %expr3,
@ -74,7 +74,7 @@ This operation always reads a slice starting at `%A[%expr1, %expr2, %expr3,
That slice needs to be read into a `vector<3x4x5xf32>`. Since the permutation
map is not full rank, there must be a broadcast along vector dimension `1`.
A notional lowering of vector_transfer_read could generate code resembling:
A notional lowering of vector.transfer_read could generate code resembling:
```mlir {.mlir}
// %expr1, %expr2, %expr3, %expr4 defined before this point
@ -117,12 +117,12 @@ the same amount of data as the `3 * 5` values transferred. An additional `1`
broadcast is required. On a GPU this broadcast could be implemented using a
warp-shuffle if loop `j` were mapped to `threadIdx.x`.
#### `vector_transfer_write` operation {#'vector_transfer_write'-operation}
#### `vector.transfer_write` operation {#'vector.transfer_write'-operation}
Syntax:
``` {.ebnf}
operation ::= `vector_transfer_write` ssa-use-list `{` attribute-entry `} :` vector-type ', ' memref-type ', ' index-type-list
operation ::= `vector.transfer_write` ssa-use-list `{` attribute-entry `} :` vector-type ', ' memref-type ', ' index-type-list
```
Examples:
@ -134,45 +134,46 @@ for %i0 = 0 to %0 {
affine.for %i2 = 0 to %2 step 64 {
affine.for %i3 = 0 to %3 step 16 {
%val = `ssa-value` : vector<16x32x64xf32>
vector_transfer_write %val, %A, %i0, %i1, %i2, %i3
vector.transfer_write %val, %A[%i0, %i1, %i2, %i3]
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index
vector<16x32x64xf32>, memref<?x?x?x?xf32>
}}}}
```
The `vector_transfer_write` performs a blocking write from a
[vector](#vector-type), supplied as its first operand, into a slice within a
scalar [MemRef](#memref-type) of the same elemental type, supplied as its second
operand. The slice is further defined by a full-rank index within the MemRef,
supplied as the operands `3 .. 2 + rank(memref)`. The permutation_map
[attribute](#attributes) is an [affine-map](#affine-maps) which specifies the
transposition on the slice to match the vector shape. The size of the slice is
specified by the size of the vector. This operation is called 'write' by
opposition to 'store' because the super-vector granularity is generally not
representable with a single hardware register. A `vector_transfer_write` is thus
a mid-level abstraction that supports super-vectorization with non-effecting
padding for full-tile-only code. It is the responsibility of
`vector_transfer_write`'s implementation to ensure the memory writes are valid.
Different lowerings may be pertinent depending on the hardware support.
The `vector.transfer_write` performs a blocking write from a
[vector](../LangRef.md#vector-type), supplied as its first operand, into a slice
within a scalar [MemRef](../LangRef.md#memref-type) of the same elemental type,
supplied as its second operand. The slice is further defined by a full-rank
index within the MemRef, supplied as the operands `3 .. 2 + rank(memref)`. The
permutation_map [attribute](../LangRef.md#attributes) is an
[affine-map](../LangRef.md#affine-maps) which specifies the transposition on the
slice to match the vector shape. The size of the slice is specified by the size
of the vector. This operation is called 'write' by opposition to 'store' because
the super-vector granularity is generally not representable with a single
hardware register. A `vector.transfer_write` is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile-only
code. It is the responsibility of `vector.transfer_write`'s implementation to
ensure the memory writes are valid. Different lowerings may be pertinent
depending on the hardware support.
### Vector views {#vector-views}
#### `vector_type_cast` operation {#'vector_type_cast'-operation}
#### `vector.type_cast` operation {#'vector.type_cast'-operation}
Syntax:
``` {.ebnf}
operation ::= `vector_type_cast` ssa-use : memref-type, memref-type
operation ::= `vector.type_cast` ssa-use : memref-type, memref-type
```
Examples:
```mlir
%A = alloc() : memref<5x4x3xf32>
%VA = vector_type_cast %A : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
%VA = vector.type_cast %A : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
```
The `vector_type_cast` operation performs a conversion from a memref with scalar
The `vector.type_cast` operation performs a conversion from a memref with scalar
element to memref with a *single* vector element, copying the shape of the
memref to the vector. This is the minimal viable operation that is required to
make super-vectorization operational. It can be seen as a special case of the

View File

@ -2211,7 +2211,7 @@ a conversion pass.
Currently, MLIR supports the following dialects:
* [Standard dialect](#standard-operations)
* [SuperVector dialect](Dialects/SuperVector.md)
* [Vector dialect](Dialects/Vector.md)
* [TensorFlow dialect](#tensorflow-operations)
### TensorFlow operations {#tensorflow-operations}

View File

@ -89,13 +89,13 @@ shapeRatio(VectorType superVectorType, VectorType subVectorType);
/// affine.for %i3 = 0 to %0 step 32 {
/// affine.for %i4 = 0 to %1 {
/// affine.for %i5 = 0 to %2 step 256 {
/// %4 = vector_transfer_read %arg0, %i4, %i5, %i3
/// %4 = vector.transfer_read %arg0, %i4, %i5, %i3
/// {permutation_map: (d0, d1, d2) -> (d2, d1)} :
/// (memref<?x?x?xf32>, index, index) -> vector<32x256xf32>
/// }}}
/// ```
///
/// Meaning that vector_transfer_read will be responsible for reading the slice:
/// Meaning that vector.transfer_read will be responsible for reading the slice:
/// `%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>.
///
/// Example 2:
@ -112,13 +112,13 @@ shapeRatio(VectorType superVectorType, VectorType subVectorType);
///
/// ```mlir
/// affine.for %i0 = 0 to %0 step 128 {
/// %3 = vector_transfer_read %arg0, %c0_0, %c0_0
/// %3 = vector.transfer_read %arg0, %c0_0, %c0_0
/// {permutation_map: (d0, d1) -> (0)} :
/// (memref<?x?xf32>, index, index) -> vector<128xf32>
/// }
/// ````
///
/// Meaning that vector_transfer_read will be responsible of reading the slice
/// Meaning that vector.transfer_read will be responsible of reading the slice
/// `%arg0[%c0, %c0]` into vector<128xf32> which needs a 1-D vector broadcast.
///
AffineMap makePermutationMap(
@ -127,7 +127,7 @@ AffineMap makePermutationMap(
namespace matcher {
/// Matches vector_transfer_read, vector_transfer_write and ops that return a
/// Matches vector.transfer_read, vector.transfer_write and ops that return a
/// vector type that is a multiple of the sub-vector type. This allows passing
/// over other smaller vector types in the function and avoids interfering with
/// operations on those.
@ -135,7 +135,7 @@ namespace matcher {
/// TODO(ntv): this could all be much simpler if we added a bit that a vector
/// type to mark that a vector is a strict super-vector but it still does not
/// warrant adding even 1 extra bit in the IR for now.
bool operatesOnSuperVectors(Operation &op, VectorType subVectorType);
bool operatesOnSuperVectorsOf(Operation &op, VectorType subVectorType);
} // end namespace matcher
} // end namespace mlir

View File

@ -26,7 +26,7 @@
#include "mlir/AffineOps/AffineOps.h"
#include "mlir/IR/Builders.h"
#include "mlir/StandardOps/Ops.h"
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/VectorOps/VectorOps.h"
namespace mlir {

View File

@ -325,7 +325,7 @@ protected:
/// ```mlir
/// Stmt scalarValue, vectorValue, tmpAlloc, tmpDealloc, vectorView;
/// tmpAlloc = alloc(tmpMemRefType);
/// vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
/// vectorView = vector.type_cast(tmpAlloc, vectorMemRefType),
/// vectorValue = load(vectorView, zero),
/// tmpDealloc = dealloc(tmpAlloc)});
/// emitter.emitStmts({tmpAlloc, vectorView, vectorValue, tmpDealloc});
@ -391,7 +391,7 @@ protected:
/// Stmt scalarValue, vectorValue, tmpAlloc, tmpDealloc, vectorView;
/// Stmt block = Block({
/// tmpAlloc = alloc(tmpMemRefType),
/// vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
/// vectorView = vector.type_cast(tmpAlloc, vectorMemRefType),
/// For(ivs, lbs, ubs, steps, {
/// scalarValue = load(scalarMemRef,
/// accessInfo.clippedScalarAccessExprs), store(scalarValue, tmpAlloc,

View File

@ -1,4 +1,4 @@
//===- SuperVectorOps.h - MLIR Super Vectorizer Operations ------*- C++ -*-===//
//===- VectorOps.h - MLIR Super Vectorizer Operations -----------*- C++ -*-===//
//
// Copyright 2019 The MLIR Authors.
//
@ -20,8 +20,8 @@
//
//===----------------------------------------------------------------------===//
#ifndef MLIR_INCLUDE_MLIR_SUPERVECTOROPS_SUPERVECTOROPS_H
#define MLIR_INCLUDE_MLIR_SUPERVECTOROPS_SUPERVECTOROPS_H
#ifndef MLIR_VECTOROPS_VECTOROPS_H
#define MLIR_VECTOROPS_VECTOROPS_H
#include "mlir/IR/Attributes.h"
#include "mlir/IR/Dialect.h"
@ -31,9 +31,9 @@
namespace mlir {
/// Dialect for super-vectorization Ops.
class SuperVectorOpsDialect : public Dialect {
class VectorOpsDialect : public Dialect {
public:
SuperVectorOpsDialect(MLIRContext *context);
VectorOpsDialect(MLIRContext *context);
};
/// VectorTransferReadOp performs a blocking read from a scalar memref
@ -70,14 +70,12 @@ public:
/// ...
/// %val = `ssa-value` : f32
/// // let %i, %j, %k, %l be ssa-values of type index
/// %v0 = vector_transfer_read %src, %i, %j, %k, %l
/// %v0 = vector.transfer_read %src[%i, %j, %k, %l]
/// {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
/// (memref<?x?x?x?xf32>, index, index, index, index) ->
/// vector<16x32x64xf32>
/// %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
/// memref<?x?x?x?xf32>, vector<16x32x64xf32>
/// %v1 = vector.transfer_read %src[%i, %j, %k, %l], (%val)
/// {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
/// (memref<?x?x?x?xf32>, index, index, index, index, f32) ->
/// vector<16x32x64xf32>
/// memref<?x?x?x?xf32>, vector<16x32x64xf32>
/// ```
///
/// Example with partial rank permutation_map:
@ -86,10 +84,9 @@ public:
/// %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
/// ...
/// // let %i, %j be ssa-values of type index
/// %v0 = vector_transfer_read %src, %i, %c0, %c0, %c0
/// %v0 = vector.transfer_read %src[%i, %c0, %c0, %c0]
/// {permutation_map: (d0, d1, d2, d3) -> (0, d1, 0)} :
/// (memref<?x?x?x?xf32>, index, index, index, index) ->
/// vector<16x32x64xf32>
/// memref<?x?x?x?xf32>, vector<16x32x64xf32>
class VectorTransferReadOp
: public Op<VectorTransferReadOp, OpTrait::VariadicOperands,
OpTrait::OneResult> {
@ -98,7 +95,7 @@ class VectorTransferReadOp
public:
using Op::Op;
static StringRef getOperationName() { return "vector_transfer_read"; }
static StringRef getOperationName() { return "vector.transfer_read"; }
static StringRef getPermutationMapAttrName() { return "permutation_map"; }
static void build(Builder *builder, OperationState *result,
VectorType vectorType, Value *srcMemRef,
@ -133,7 +130,7 @@ public:
///
/// A vector transfer write has semantics similar to a vector store, with
/// additional support for handling out-of-bounds situations. It is the
/// responsibility of vector_transfer_write's implementation to ensure the
/// responsibility of vector.transfer_write's implementation to ensure the
/// memory writes are valid. Different implementations may be pertinent
/// depending on the hardware support including:
/// 1. predication;
@ -146,9 +143,9 @@ public:
/// %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
/// %val = `ssa-value` : vector<16x32x64xf32>
/// // let %i, %j, %k, %l be ssa-values of type index
/// vector_transfer_write %val, %src, %i, %j, %k, %l
/// vector.transfer_write %val, %src[%i, %j, %k, %l]
/// {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
/// vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index
/// vector<16x32x64xf32>, memref<?x?x?x?xf32>
/// ```
class VectorTransferWriteOp
: public Op<VectorTransferWriteOp, OpTrait::VariadicOperands,
@ -162,7 +159,7 @@ class VectorTransferWriteOp
public:
using Op::Op;
static StringRef getOperationName() { return "vector_transfer_write"; }
static StringRef getOperationName() { return "vector.transfer_write"; }
static StringRef getPermutationMapAttrName() { return "permutation_map"; }
static void build(Builder *builder, OperationState *result, Value *srcVector,
Value *dstMemRef, ArrayRef<Value *> dstIndices,
@ -190,14 +187,14 @@ public:
///
/// ```mlir
/// %A = alloc() : memref<5x4x3xf32>
/// %VA = vector_type_cast %A : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
/// %VA = vector.type_cast %A : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
/// ```
class VectorTypeCastOp
: public Op<VectorTypeCastOp, OpTrait::OneOperand, OpTrait::OneResult> {
public:
using Op::Op;
static StringRef getOperationName() { return "vector_type_cast"; }
static StringRef getOperationName() { return "vector.type_cast"; }
static void build(Builder *builder, OperationState *result, Value *srcVector,
Type dstType);
static bool parse(OpAsmParser *parser, OperationState *result);
@ -207,4 +204,4 @@ public:
} // end namespace mlir
#endif // MLIR_INCLUDE_MLIR_SUPERVECTOROPS_SUPERVECTOROPS_H
#endif // MLIR_VECTOROPS_VECTOROPS_H

View File

@ -30,9 +30,9 @@
#include "mlir/IR/Builders.h"
#include "mlir/IR/Operation.h"
#include "mlir/StandardOps/Ops.h"
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/Support/Functional.h"
#include "mlir/Support/MathExtras.h"
#include "mlir/VectorOps/VectorOps.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/SmallString.h"

View File

@ -22,9 +22,9 @@
#include "mlir/IR/IntegerSet.h"
#include "mlir/IR/Operation.h"
#include "mlir/StandardOps/Ops.h"
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/Support/Functional.h"
#include "mlir/Support/STLExtras.h"
#include "mlir/VectorOps/VectorOps.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/SetVector.h"
@ -185,11 +185,11 @@ AffineMap mlir::makePermutationMap(
return ::makePermutationMap(store.getIndices(), enclosingLoopToVectorDim);
}
bool mlir::matcher::operatesOnSuperVectors(Operation &op,
VectorType subVectorType) {
bool mlir::matcher::operatesOnSuperVectorsOf(Operation &op,
VectorType subVectorType) {
// First, extract the vector type and ditinguish between:
// a. ops that *must* lower a super-vector (i.e. vector_transfer_read,
// vector_transfer_write); and
// a. ops that *must* lower a super-vector (i.e. vector.transfer_read,
// vector.transfer_write); and
// b. ops that *may* lower a super-vector (all other ops).
// The ops that *may* lower a super-vector only do so if the super-vector to
// sub-vector ratio exists. The ops that *must* lower a super-vector are
@ -218,7 +218,7 @@ bool mlir::matcher::operatesOnSuperVectors(Operation &op,
return false;
}
} else {
// Not a vector_transfer and has more than 1 result, fail hard for now to
// Not a vector.transfer and has more than 1 result, fail hard for now to
// wake us up when something changes.
op.emitError("NYI: operation has more than 1 result");
return false;
@ -229,7 +229,7 @@ bool mlir::matcher::operatesOnSuperVectors(Operation &op,
// Sanity check.
assert((ratio.hasValue() || !mustDivide) &&
"vector_transfer operation in which super-vector size is not an"
"vector.transfer operation in which super-vector size is not an"
" integer multiple of sub-vector size");
// This catches cases that are not strictly necessary to have multiplicity but

View File

@ -18,7 +18,7 @@
#include "mlir/EDSC/Intrinsics.h"
#include "mlir/EDSC/Builders.h"
#include "mlir/IR/AffineExpr.h"
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/VectorOps/VectorOps.h"
using namespace mlir;
using namespace mlir::edsc;

View File

@ -32,9 +32,9 @@
#include "mlir/IR/Operation.h"
#include "mlir/IR/Value.h"
#include "mlir/StandardOps/Ops.h"
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/Support/Functional.h"
#include "mlir/Support/STLExtras.h"
#include "mlir/VectorOps/VectorOps.h"
using llvm::dbgs;
using llvm::errs;

View File

@ -27,8 +27,8 @@
#include "mlir/IR/OperationSupport.h"
#include "mlir/IR/StandardTypes.h"
#include "mlir/StandardOps/Ops.h"
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/Support/STLExtras.h"
#include "mlir/VectorOps/VectorOps.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringSwitch.h"

View File

@ -38,10 +38,10 @@
#include "mlir/IR/Types.h"
#include "mlir/Pass/Pass.h"
#include "mlir/StandardOps/Ops.h"
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/Support/Functional.h"
#include "mlir/Transforms/MLPatternLoweringPass.h"
#include "mlir/Transforms/Passes.h"
#include "mlir/VectorOps/VectorOps.h"
/// Implements lowering of VectorTransferReadOp and VectorTransferWriteOp to a
/// proper abstraction for the hardware.
@ -58,9 +58,9 @@
/// affine.for %i0 = 0 to %0 {
/// affine.for %i1 = 0 to %1 step 256 {
/// affine.for %i2 = 0 to %2 step 32 {
/// %v = vector_transfer_read %A, %i0, %i1, %i2, %f0
/// %v = vector.transfer_read %A[%i0, %i1, %i2], (%f0)
/// {permutation_map: (d0, d1, d2) -> (d2, d1)} :
/// (memref<?x?x?xf32>, index, index, f32) -> vector<32x256xf32>
/// memref<?x?x?xf32>, vector<32x256xf32>
/// }}}
/// ```
///

View File

@ -37,10 +37,10 @@
#include "mlir/IR/Types.h"
#include "mlir/Pass/Pass.h"
#include "mlir/StandardOps/Ops.h"
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/Support/Functional.h"
#include "mlir/Support/LLVM.h"
#include "mlir/Transforms/Passes.h"
#include "mlir/VectorOps/VectorOps.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
@ -54,7 +54,7 @@
/// a target-independent way: the target vector size is specified as a parameter
/// to the pass. This pass is thus a partial lowering that opens the "greybox"
/// that is the super-vector abstraction. In particular, this pass can turn the
/// vector_transfer_read and vector_transfer_write ops in either:
/// vector.transfer_read and vector.transfer_write ops in either:
/// 1. a loop nest with either scalar and vector load/store operations; or
/// 2. a loop-nest with DmaStartOp / DmaWaitOp; or
/// 3. a pre-existing blackbox library call that can be written manually or
@ -79,14 +79,14 @@
/// do not vectorize in the presence of conditionals for now, sliced chains are
/// guaranteed not to escape the innermost scope, which has to be either the top
/// Function scope or the innermost loop scope, by construction. As a
/// consequence, the implementation just starts from vector_transfer_write
/// consequence, the implementation just starts from vector.transfer_write
/// operations and builds the slice scoped the innermost loop enclosing the
/// current vector_transfer_write. These assumptions and the implementation
/// current vector.transfer_write. These assumptions and the implementation
/// details are subject to revision in the future.
///
/// Example
/// ========
/// In the following, the single vector_transfer_write op operates on a
/// In the following, the single vector.transfer_write op operates on a
/// vector<4x4x4xf32>. Let's assume the HW supports vector<4x4xf32>.
/// Materialization is achieved by instantiating each occurrence of the leading
/// dimension of vector<4x4x4xf32> into a vector<4x4xf32>.
@ -98,16 +98,15 @@
///
/// ```mlir
/// mlfunc @materialize(%M : index, %N : index, %O : index, %P : index) {
/// %A = alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32, 0>
/// %A = alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32>
/// %f1 = constant splat<vector<4x4x4xf32>, 1.000000e+00> :
/// vector<4x4x4xf32> affine.for %i0 = 0 to %M step 4 {
/// affine.for %i1 = 0 to %N step 4 {
/// affine.for %i2 = 0 to %O {
/// affine.for %i3 = 0 to %P step 4 {
/// vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3
/// vector.transfer_write %f1, %A[%i0, %i1, %i2, %i3]
/// {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} :
/// vector<4x4x4xf32>, memref<?x?x?x?xf32, 0>,
/// index, index, index, index
/// vector<4x4x4xf32>, memref<?x?x?x?xf32>
/// }}}}
/// return
/// }
@ -123,30 +122,21 @@
/// affine.for %i1 = 0 to %arg1 step 4 {
/// affine.for %i2 = 0 to %arg2 {
/// affine.for %i3 = 0 to %arg3 step 4 {
/// %1 = affine.apply (d0, d1, d2, d3) -> (d0, d1, d2, d3)
/// (%i0, %i1, %i2, %i3)
/// vector_transfer_write f1, %0, %1#0, %1#1, %1#2, %1#3
/// vector.transfer_write f1, %0[%i0, %i1, %i2, %i3]
/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} :
/// vector<4x4xf32>, memref<?x?x?x?xf32>,
/// index, index, index, index
/// %2 = affine.apply (d0, d1, d2, d3) -> (d0, d1, d2, d3 + 1)
/// (%i0, %i1, %i2, %i3)
/// vector_transfer_write {{.*}}, %0, %2#0, %2#1, %2#2, %2#3
/// vector<4x4xf32>, memref<?x?x?x?xf32>
/// %i3p1 = affine.apply (d0) -> (d0 + 1)(%i3)
/// vector.transfer_write {{.*}}, %0[%i0, %i1, %i2, %i3p1]
/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} :
/// vector<4x4xf32>, memref<?x?x?x?xf32>,
/// index, index, index, index
/// %3 = affine.apply (d0, d1, d2, d3) -> (d0, d1, d2, d3 + 2)
/// (%i0, %i1, %i2, %i3)
/// vector_transfer_write {{.*}}, %0, %3#0, %3#1, %3#2, %3#3
/// vector<4x4xf32>, memref<?x?x?x?xf32>
/// %i3p2 = affine.apply (d0) -> (d0 + 2)(%i3)
/// vector.transfer_write {{.*}}, %0[%i0, %i1, %i2, %i3p2]
/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} :
/// vector<4x4xf32>, memref<?x?x?x?xf32>,
/// index, index, index, index
/// %4 = affine.apply (d0, d1, d2, d3) -> (d0, d1, d2, d3 + 3)
/// (%i0, %i1, %i2, %i3)
/// vector_transfer_write {{.*}}, %0, %4#0, %4#1, %4#2, %4#3
/// vector<4x4xf32>, memref<?x?x?x?xf32>
/// %i3p3 = affine.apply (d0) -> (d0 + 3)(%i3)
/// vector.transfer_write {{.*}}, %0[%i0, %i1, %i2, %i3p3]
/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} :
/// vector<4x4xf32>, memref<?x?x?x?xf32>,
/// index, index, index, index
/// vector<4x4xf32>, memref<?x?x?x?xf32>
/// }}}}
/// return
/// }
@ -282,11 +272,11 @@ static Value *substitute(Value *v, VectorType hwVectorType,
/// Returns a list of single result AffineApplyOps that reindex the
/// `memRefIndices` by the multi-dimensional `hwVectorInstance`. This is used by
/// the function that materializes a vector_transfer operation to use hardware
/// the function that materializes a vector.transfer operation to use hardware
/// vector types instead of super-vector types.
///
/// The general problem this function solves is as follows:
/// Assume a vector_transfer operation at the super-vector granularity that has
/// Assume a vector.transfer operation at the super-vector granularity that has
/// `l` enclosing loops (AffineForOp). Assume the vector transfer operation
/// operates on a MemRef of rank `r`, a super-vector of rank `s` and a hardware
/// vector of rank `h`. For the purpose of illustration assume l==4, r==3, s==2,
@ -299,8 +289,8 @@ static Value *substitute(Value *v, VectorType hwVectorType,
/// affine.for %i1 = 0 to %N step 3 {
/// affine.for %i2 = 0 to %O {
/// affine.for %i3 = 0 to %P step 32 {
/// %r = vector_transfer_read(%A, map(%i..)#0, map(%i..)#1, map(%i..)#2)
/// -> vector<3x32xf32>
/// %r = vector.transfer_read(%A, map0(%i..), map1(%i..), map2(%i..)) :
/// vector<3x32xf32>, memref<?x?x?xf32>
/// ...
/// }}}}
/// ```
@ -308,28 +298,31 @@ static Value *substitute(Value *v, VectorType hwVectorType,
/// where map denotes an AffineMap operating on enclosing loops with properties
/// compatible for vectorization (i.e. some contiguity left unspecified here).
/// Note that the vectorized loops are %i1 and %i3.
/// This function translates the vector_transfer_read operation to multiple
/// instances of vector_transfer_read that operate on vector<8x32>.
/// This function translates the vector.transfer_read operation to multiple
/// instances of vector.transfer_read that operate on vector<8x32>.
///
/// Without loss of generality, we assume hwVectorInstance is: {2, 1}.
/// The only constraints on hwVectorInstance is they belong to:
/// [0, 2] x [0, 3], which is the span of ratio of super-vector shape to
/// hardware vector shape in our example.
///
/// This function instantiates the iteration <2, 1> of vector_transfer_read
/// This function instantiates the iteration <2, 1> of vector.transfer_read
/// into the set of operations in pseudo-MLIR:
///
/// ```mlir
/// map2 = (d0, d1, d2, d3) -> (d0, d1 + 2, d2, d3 + 1 * 8)
/// map3 = map o map2 // where o denotes composition
/// %r = vector_transfer_read(%A, map3(%i..)#0, map3(%i..)#1, map3(%i..)#2)
/// -> vector<3x32xf32>
/// #map2 = (d0, d1, d2, d3) -> (d0, d1 + 2, d2, d3 + 1 * 8)
/// #map3 = #map o #map2 // where o denotes composition
/// aff0 = affine.apply #map3.0(%i..)
/// aff1 = affine.apply #map3.1(%i..)
/// aff2 = affine.apply #map3.2(%i..)
/// %r = vector.transfer_read(%A, %aff0, %aff1, %aff2):
// vector<3x32xf32>, memref<?x?x?xf32>
/// ```
///
/// Practical considerations
/// ========================
/// For now, `map` is assumed to be the identity map and the indices are
/// specified just as vector_transfer_read(%A, %i0, %i1, %i2, %i3). This will be
/// specified just as vector.transfer_read%A[%i0, %i1, %i2, %i3]. This will be
/// extended in the future once we have a proper Op for vector transfers.
/// Additionally, the example above is specified in pseudo-MLIR form; once we
/// have proper support for generic maps we can generate the code and show
@ -347,7 +340,7 @@ reindexAffineIndices(FuncBuilder *b, VectorType hwVectorType,
unsigned numIndices = memrefIndices.size();
auto numMemRefIndices = numIndices - hwVectorInstance.size();
auto numSuperVectorIndices = hwVectorInstance.size() - vectorShape.size();
auto numVectorIndices = hwVectorInstance.size() - vectorShape.size();
SmallVector<AffineExpr, 8> affineExprs;
// TODO(ntv): support a concrete map and composition.
@ -358,11 +351,10 @@ reindexAffineIndices(FuncBuilder *b, VectorType hwVectorType,
auto d_i = b->getAffineDimExpr(i);
affineExprs.push_back(d_i);
}
// The next numSuperVectorIndices correspond to super-vector dimensions that
// The next numVectorIndices correspond to super-vector dimensions that
// do not have a hardware vector dimension counterpart. For those we only
// need to increment the index by the corresponding hwVectorInstance.
for (i = numMemRefIndices; i < numMemRefIndices + numSuperVectorIndices;
++i) {
for (i = numMemRefIndices; i < numMemRefIndices + numVectorIndices; ++i) {
auto d_i = b->getAffineDimExpr(i);
auto offset = hwVectorInstance[i - numMemRefIndices];
affineExprs.push_back(d_i + offset);
@ -374,7 +366,7 @@ reindexAffineIndices(FuncBuilder *b, VectorType hwVectorType,
for (; i < numIndices; ++i) {
auto d_i = b->getAffineDimExpr(i);
auto offset = hwVectorInstance[i - numMemRefIndices];
auto stride = vectorShape[i - numMemRefIndices - numSuperVectorIndices];
auto stride = vectorShape[i - numMemRefIndices - numVectorIndices];
affineExprs.push_back(d_i + offset * stride);
}
@ -535,14 +527,14 @@ static Operation *instantiate(FuncBuilder *b, VectorTransferWriteOp write,
/// The multi-dimensional `hwVectorInstance` belongs to the shapeRatio of
/// super-vector type to hw vector type.
/// A cloned instance of `op` is formed as follows:
/// 1. vector_transfer_read: the return `superVectorType` is replaced by
/// 1. vector.transfer_read: the return `superVectorType` is replaced by
/// `hwVectorType`. Additionally, affine indices are reindexed with
/// `reindexAffineIndices` using `hwVectorInstance` and vector type
/// information;
/// 2. vector_transfer_write: the `valueToStore` type is simply substituted.
/// 2. vector.transfer_write: the `valueToStore` type is simply substituted.
/// Since we operate on a topologically sorted slice, a substitution must
/// have been registered for non-constant ops. Additionally, affine indices
/// are reindexed in the same way as for vector_transfer_read;
/// are reindexed in the same way as for vector.transfer_read;
/// 3. constant ops are splats of the super-vector type by construction.
/// They are cloned to a splat on the hw vector type with the same value;
/// 4. remaining ops are cloned to version of the op that returns a hw vector
@ -660,7 +652,7 @@ static bool emitSlice(MaterializationState *state,
}
/// Materializes super-vector types into concrete hw vector types as follows:
/// 1. start from super-vector terminators (current vector_transfer_write
/// 1. start from super-vector terminators (current vector.transfer_write
/// ops);
/// 2. collect all the operations that can be reached by transitive use-defs
/// chains;
@ -755,13 +747,13 @@ void MaterializeVectorsPass::runOnFunction() {
auto subVectorType =
VectorType::get(hwVectorSize, FloatType::getF32(&getContext()));
// Capture terminators; i.e. vector_transfer_write ops involving a strict
// Capture terminators; i.e. vector.transfer_write ops involving a strict
// super-vector of subVectorType.
auto filter = [subVectorType](Operation &op) {
if (!op.isa<VectorTransferWriteOp>()) {
return false;
}
return matcher::operatesOnSuperVectors(op, subVectorType);
return matcher::operatesOnSuperVectorsOf(op, subVectorType);
};
auto pat = Op(filter);
SmallVector<NestedMatch, 8> matches;

View File

@ -108,7 +108,7 @@ void VectorizerTestPass::testVectorShapeRatio(llvm::raw_ostream &outs) {
assert(subVectorType.getElementType() ==
FloatType::getF32(subVectorType.getContext()) &&
"Only f32 supported for now");
if (!matcher::operatesOnSuperVectors(op, subVectorType)) {
if (!matcher::operatesOnSuperVectorsOf(op, subVectorType)) {
return false;
}
if (op.getNumResults() != 1) {

View File

@ -32,10 +32,10 @@
#include "mlir/IR/Types.h"
#include "mlir/Pass/Pass.h"
#include "mlir/StandardOps/Ops.h"
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/Support/Functional.h"
#include "mlir/Support/LLVM.h"
#include "mlir/Transforms/Passes.h"
#include "mlir/VectorOps/VectorOps.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"
@ -69,7 +69,7 @@ using namespace mlir;
/// Some may prefer the terminology a "tile of HW vectors". In this case, one
/// should note that super-vectors implement an "always full tile" abstraction.
/// They guarantee no partial-tile separation is necessary by relying on a
/// high-level copy-reshape abstraction that we call vector_transfer. This
/// high-level copy-reshape abstraction that we call vector.transfer. This
/// copy-reshape operations is also responsible for performing layout
/// transposition if necessary. In the general case this will require a scoped
/// allocation in some notional local memory.
@ -115,19 +115,17 @@ using namespace mlir;
/// At a high level, a vectorized load in a loop will resemble:
/// ```mlir
/// affine.for %i = ? to ? step ? {
/// %v_a = "vector_transfer_read" (A, %i) : (memref<?xf32>, index) ->
/// vector<128xf32>
/// %v_a = vector.transfer_read A[%i] : memref<?xf32>, vector<128xf32>
/// }
/// ```
/// It is the reponsibility of the implementation of the vector_transfer_read
/// to materialize vector registers from the original scalar memrefs.
/// A later (more target-dependent) lowering pass will materialize to actual HW
/// vector sizes. This lowering may be occur at different times:
/// It is the responsibility of the implementation of vector.transfer_read to
/// materialize vector registers from the original scalar memrefs. A later (more
/// target-dependent) lowering pass will materialize to actual HW vector sizes.
/// This lowering may be occur at different times:
/// 1. at the MLIR level into a combination of loops, unrolling, DmaStartOp +
/// DmaWaitOp + vectorized operations
/// for data transformations and shuffle; thus opening opportunities for
/// unrolling and pipelining. This is an instance of library call
/// "whiteboxing"; or
/// DmaWaitOp + vectorized operations for data transformations and shuffle;
/// thus opening opportunities for unrolling and pipelining. This is an
/// instance of library call "whiteboxing"; or
/// 2. later in the a target-specific lowering pass or hand-written library
/// call; achieving full separation of concerns. This is an instance of
/// library call; or
@ -225,7 +223,7 @@ using namespace mlir;
/// 3. right after polyhedral-style scheduling: PLUTO-style algorithms are known
/// to improve locality, parallelism and be configurable (e.g. max-fuse,
/// smart-fuse etc). They can also have adverse effects on contiguity
/// properties that are required for vectorization but the vector_transfer
/// properties that are required for vectorization but the vector.transfer
/// copy-reshape-pad-transpose abstraction is expected to help recapture
/// these properties.
/// 4. right after polyhedral-style scheduling+tiling;
@ -265,7 +263,7 @@ using namespace mlir;
/// 3. Then, for each pattern in order:
/// a. applying iterative rewriting of the loop and the load operations in
/// DFS postorder. Rewriting is implemented by coarsening the loops and
/// turning load operations into opaque vector_transfer_read ops;
/// turning load operations into opaque vector.transfer_read ops;
/// b. keeping track of the load operations encountered as "roots" and the
/// store operations as "terminals";
/// c. traversing the use-def chains starting from the roots and iteratively
@ -304,7 +302,7 @@ using namespace mlir;
/// %s5 = addf %a5, %b5 : f32`
///
/// Lastly, we show a minimal example for which use-def chains rooted in load /
/// vector_transfer_read are not enough. This is what motivated splitting
/// vector.transfer_read are not enough. This is what motivated splitting
/// terminal processing out of the use-def chains starting from loads. In the
/// following snippet, there is simply no load::
/// ```mlir
@ -343,8 +341,7 @@ using namespace mlir;
/// scheduling, so we want to generate a pattern that resembles:
/// ```mlir
/// affine.for %i = ? to ? step ? {
/// %v_a = "vector_transfer_read" (A, %i) : (memref<?xf32>, index) ->
/// vector<128xf32>
/// %v_a = vector.transfer_read A[%i] : memref<?xf32>, vector<128xf32>
/// }
/// ```
///
@ -364,8 +361,7 @@ using namespace mlir;
/// abstraction of size 128 returns code similar to:
/// ```mlir
/// affine.for %i = %M to %N step 128 {
/// %v_a = "vector_transfer_read" (A, %i) : (memref<?xf32>, index) ->
/// vector<128xf32>
/// %v_a = vector.transfer_read A[%i] : memref<?xf32>, vector<128xf32>
/// }
/// ```
///
@ -373,9 +369,9 @@ using namespace mlir;
/// ========================================================================
/// 1. lowering to concrete vector types for various HW;
/// 2. reduction support;
/// 3. non-effecting padding during vector_transfer_read and filter during
/// vector_transfer_write;
/// 4. misalignment support vector_transfer_read / vector_transfer_write
/// 3. non-effecting padding during vector.transfer_read and filter during
/// vector.transfer_write;
/// 4. misalignment support vector.transfer_read / vector.transfer_write
/// (hopefully without read-modify-writes);
/// 5. control-flow support;
/// 6. cost-models, heuristics and search;
@ -443,24 +439,24 @@ using namespace mlir;
/// affine.for %i1 = 0 to %arg1 step 256 {
/// %cst_1 = constant splat<vector<256xf32>, 1.0> :
/// vector<256xf32>
/// "vector_transfer_write"(%cst_1, %0, %i0, %i1) :
/// (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
/// vector.transfer_write %cst_1, %0[%i0, %i1] :
/// vector<256xf32>, memref<?x?xf32>
/// }
/// }
/// affine.for %i2 = 0 to %arg0 {
/// affine.for %i3 = 0 to %arg1 step 256 {
/// %cst_2 = constant splat<vector<256xf32>, 2.0> :
/// vector<256xf32>
/// "vector_transfer_write"(%cst_2, %1, %i2, %i3) :
/// (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
/// vector.transfer_write %cst_2, %1[%i2, %i3] :
/// vector<256xf32>, memref<?x?xf32>
/// }
/// }
/// affine.for %i4 = 0 to %arg0 {
/// affine.for %i5 = 0 to %arg1 step 256 {
/// %3 = "vector_transfer_read"(%0, %i4, %i5) :
/// (memref<?x?xf32>, index, index) -> vector<256xf32>
/// %4 = "vector_transfer_read"(%1, %i4, %i5) :
/// (memref<?x?xf32>, index, index) -> vector<256xf32>
/// %3 = vector.transfer_read %0[%i4, %i5] :
/// memref<?x?xf32>, vector<256xf32>
/// %4 = vector.transfer_read %1[%i4, %i5] :
/// memref<?x?xf32>, vector<256xf32>
/// %5 = addf %3, %4 : vector<256xf32>
/// %cst_3 = constant splat<vector<256xf32>, 1.0> :
/// vector<256xf32>
@ -469,8 +465,8 @@ using namespace mlir;
/// vector<256xf32>
/// %7 = addf %5, %cst_4 : vector<256xf32>
/// %8 = addf %7, %6 : vector<256xf32>
/// "vector_transfer_write"(%8, %2, %i4, %i5) :
/// (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
/// vector.transfer_write %8, %2[%i4, %i5] :
/// vector<256xf32>, memref<?x?xf32>
/// }
/// }
/// %c7 = constant 7 : index
@ -499,24 +495,24 @@ using namespace mlir;
/// affine.for %i1 = 0 to %arg1 step 256 {
/// %cst_1 = constant splat<vector<32x256xf32>, 1.0> :
/// vector<32x256xf32>
/// "vector_transfer_write"(%cst_1, %0, %i0, %i1) :
/// (vector<32x256xf32>, memref<?x?xf32>, index, index) -> ()
/// vector.transfer_write %cst_1, %0[%i0, %i1] :
/// vector<32x256xf32>, memref<?x?xf32>
/// }
/// }
/// affine.for %i2 = 0 to %arg0 step 32 {
/// affine.for %i3 = 0 to %arg1 step 256 {
/// %cst_2 = constant splat<vector<32x256xf32>, 2.0> :
/// vector<32x256xf32>
/// "vector_transfer_write"(%cst_2, %1, %i2, %i3) :
/// (vector<32x256xf32>, memref<?x?xf32>, index, index) -> ()
/// vector.transfer_write %cst_2, %1[%i2, %i3] :
/// vector<32x256xf32>, memref<?x?xf32>
/// }
/// }
/// affine.for %i4 = 0 to %arg0 step 32 {
/// affine.for %i5 = 0 to %arg1 step 256 {
/// %3 = "vector_transfer_read"(%0, %i4, %i5) :
/// (memref<?x?xf32>, index, index) -> vector<32x256xf32>
/// %4 = "vector_transfer_read"(%1, %i4, %i5) :
/// (memref<?x?xf32>, index, index) -> vector<32x256xf32>
/// %3 = vector.transfer_read %0[%i4, %i5] :
/// memref<?x?xf32> vector<32x256xf32>
/// %4 = vector.transfer_read %1[%i4, %i5] :
/// memref<?x?xf32>, vector<32x256xf32>
/// %5 = addf %3, %4 : vector<32x256xf32>
/// %cst_3 = constant splat<vector<32x256xf32>, 1.0> :
/// vector<32x256xf32>
@ -525,8 +521,8 @@ using namespace mlir;
/// vector<32x256xf32>
/// %7 = addf %5, %cst_4 : vector<32x256xf32>
/// %8 = addf %7, %6 : vector<32x256xf32>
/// "vector_transfer_write"(%8, %2, %i4, %i5) :
/// (vector<32x256xf32>, memref<?x?xf32>, index, index) -> ()
/// vector.transfer_write %8, %2[%i4, %i5] :
/// vector<32x256xf32>, memref<?x?xf32>
/// }
/// }
/// %c7 = constant 7 : index
@ -788,7 +784,7 @@ void VectorizationState::registerReplacement(Value *key, Value *value) {
/// Handles the vectorization of load and store MLIR operations.
///
/// LoadOp operations are the roots of the vectorizeNonTerminals call. They are
/// vectorized immediately. The resulting vector_transfer_read is immediately
/// vectorized immediately. The resulting vector.transfer_read is immediately
/// registered to replace all uses of the LoadOp in this pattern's scope.
///
/// StoreOp are the terminals of the vectorizeNonTerminals call. They need to be
@ -811,9 +807,9 @@ static LogicalResult vectorizeRootOrTerminal(Value *iv,
// Materialize a MemRef with 1 vector.
auto *opInst = memoryOp.getOperation();
// For now, vector_transfers must be aligned, operate only on indices with an
// For now, vector.transfers must be aligned, operate only on indices with an
// identity subset of AffineMap and do not change layout.
// TODO(ntv): increase the expressiveness power of vector_transfer operations
// TODO(ntv): increase the expressiveness power of vector.transfer operations
// as needed by various targets.
if (opInst->template isa<LoadOp>()) {
auto permutationMap =
@ -835,7 +831,7 @@ static LogicalResult vectorizeRootOrTerminal(Value *iv,
/// end TODO(ntv): Hoist to a VectorizationMaterialize.cpp when appropriate. ///
/// Coarsens the loops bounds and transforms all remaining load and store
/// operations into the appropriate vector_transfer.
/// operations into the appropriate vector.transfer.
static LogicalResult vectorizeAffineForOp(AffineForOp loop, int64_t step,
VectorizationState *state) {
using namespace functional;
@ -1023,9 +1019,9 @@ static Operation *vectorizeOneOperation(Operation *opInst,
assert(!opInst->isa<LoadOp>() &&
"all loads must have already been fully vectorized independently");
assert(!opInst->isa<VectorTransferReadOp>() &&
"vector_transfer_read cannot be further vectorized");
"vector.transfer_read cannot be further vectorized");
assert(!opInst->isa<VectorTransferWriteOp>() &&
"vector_transfer_write cannot be further vectorized");
"vector.transfer_write cannot be further vectorized");
if (auto store = opInst->dyn_cast<StoreOp>()) {
auto *memRef = store.getMemRef();

View File

@ -15,8 +15,8 @@
// limitations under the License.
// =============================================================================
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/VectorOps/VectorOps.h"
using namespace mlir;
// Static initialization for SuperVectorOps dialect registration.
static DialectRegistration<SuperVectorOpsDialect> SuperVectorOps;
// Static initialization for VectorOps dialect registration.
static DialectRegistration<VectorOpsDialect> VectorOps;

View File

@ -1,4 +1,4 @@
//===- SuperVectorOps.cpp - MLIR Super Vectorizer Operations---------------===//
//===- VectorOps.cpp - MLIR Super Vectorizer Operations -------------------===//
//
// Copyright 2019 The MLIR Authors.
//
@ -20,7 +20,7 @@
//
//===----------------------------------------------------------------------===//
#include "mlir/SuperVectorOps/SuperVectorOps.h"
#include "mlir/VectorOps/VectorOps.h"
#include "mlir/IR/AffineExpr.h"
#include "mlir/IR/AffineMap.h"
#include "mlir/IR/Builders.h"
@ -29,11 +29,11 @@
using namespace mlir;
//===----------------------------------------------------------------------===//
// SuperVectorOpsDialect
// VectorOpsDialect
//===----------------------------------------------------------------------===//
SuperVectorOpsDialect::SuperVectorOpsDialect(MLIRContext *context)
: Dialect(/*namePrefix=*/"", context) {
VectorOpsDialect::VectorOpsDialect(MLIRContext *context)
: Dialect("vector", context) {
addOperations<VectorTransferReadOp, VectorTransferWriteOp,
VectorTypeCastOp>();
}
@ -107,88 +107,65 @@ AffineMap VectorTransferReadOp::getPermutationMap() {
void VectorTransferReadOp::print(OpAsmPrinter *p) {
*p << getOperationName() << " ";
p->printOperand(getMemRef());
*p << ", ";
*p << "[";
p->printOperands(getIndices());
*p << "]";
auto optionalPaddingValue = getPaddingValue();
if (optionalPaddingValue) {
*p << ", ";
*p << ", (";
p->printOperand(*optionalPaddingValue);
*p << ")";
}
p->printOptionalAttrDict(getAttrs());
// Construct the FunctionType and print it.
llvm::SmallVector<Type, 8> inputs{getMemRefType()};
// Must have at least one actual index, see verify.
Value *firstIndex = *getIndices().begin();
Type indexType = firstIndex->getType();
inputs.append(getMemRefType().getRank(), indexType);
if (optionalPaddingValue) {
inputs.push_back((*optionalPaddingValue)->getType());
}
*p << " : "
<< FunctionType::get(inputs, {getResultType()}, indexType.getContext());
*p << " : " << getMemRefType();
*p << ", " << getResultType();
}
bool VectorTransferReadOp::parse(OpAsmParser *parser, OperationState *result) {
SmallVector<OpAsmParser::OperandType, 8> parsedOperands;
Type type;
OpAsmParser::OperandType memrefInfo;
SmallVector<OpAsmParser::OperandType, 8> indexInfo;
SmallVector<OpAsmParser::OperandType, 8> paddingInfo;
SmallVector<Type, 2> types;
// Parsing with support for optional paddingValue.
auto fail = parser->parseOperandList(parsedOperands) ||
parser->parseOptionalAttributeDict(result->attributes) ||
parser->parseColonType(type);
if (fail) {
if (parser->parseOperand(memrefInfo) ||
parser->parseOperandList(indexInfo, -1, OpAsmParser::Delimiter::Square) ||
parser->parseTrailingOperandList(paddingInfo, -1,
OpAsmParser::Delimiter::Paren) ||
parser->parseOptionalAttributeDict(result->attributes) ||
parser->parseColonTypeList(types))
return true;
}
// Resolution.
auto funType = type.dyn_cast<FunctionType>();
if (!funType)
return parser->emitError(parser->getNameLoc(), "Function type expected");
if (funType.getNumInputs() < 1)
return parser->emitError(parser->getNameLoc(),
"Function type expects at least one input");
MemRefType memrefType =
funType.getInput(Offsets::MemRefOffset).dyn_cast<MemRefType>();
if (types.size() != 2)
return parser->emitError(parser->getNameLoc(), "expected 2 types");
MemRefType memrefType = types[0].dyn_cast<MemRefType>();
if (!memrefType)
return parser->emitError(parser->getNameLoc(),
"MemRef type expected for first input");
if (funType.getNumResults() < 1)
return parser->emitError(parser->getNameLoc(),
"Function type expects exactly one vector result");
VectorType vectorType = funType.getResult(0).dyn_cast<VectorType>();
return parser->emitError(parser->getNameLoc(), "memRef type expected");
VectorType vectorType = types[1].dyn_cast<VectorType>();
if (!vectorType)
return parser->emitError(parser->getNameLoc(),
"Vector type expected for first result");
if (parsedOperands.size() != funType.getNumInputs())
return parser->emitError(parser->getNameLoc(),
"requires " + Twine(funType.getNumInputs()) +
" operands");
return parser->emitError(parser->getNameLoc(), "vector type expected");
// Extract optional paddingValue.
OpAsmParser::OperandType memrefInfo = parsedOperands[0];
// At this point, indexInfo may contain the optional paddingValue, pop it out.
SmallVector<OpAsmParser::OperandType, 8> indexInfo{
parsedOperands.begin() + Offsets::FirstIndexOffset, parsedOperands.end()};
if (indexInfo.size() != memrefType.getRank())
return parser->emitError(parser->getNameLoc(),
"expected " + Twine(memrefType.getRank()) +
" indices to the memref");
if (paddingInfo.size() > 1)
return parser->emitError(parser->getNameLoc(),
"expected at most one padding value");
Type paddingType;
OpAsmParser::OperandType paddingValue;
bool hasPaddingValue = indexInfo.size() > memrefType.getRank();
unsigned expectedNumOperands = Offsets::FirstIndexOffset +
memrefType.getRank() +
(hasPaddingValue ? 1 : 0);
if (hasPaddingValue) {
paddingType = funType.getInputs().back();
paddingValue = indexInfo.pop_back_val();
bool hasOptionalPaddingValue = !paddingInfo.empty();
if (hasOptionalPaddingValue) {
paddingType = vectorType.getElementType();
}
if (funType.getNumInputs() != expectedNumOperands)
return parser->emitError(
parser->getNameLoc(),
"requires actual number of operands to match function type");
auto indexType = parser->getBuilder().getIndexType();
return parser->resolveOperand(memrefInfo, memrefType, result->operands) ||
parser->resolveOperands(indexInfo, indexType, result->operands) ||
(hasPaddingValue && parser->resolveOperand(paddingValue, paddingType,
result->operands)) ||
(hasOptionalPaddingValue &&
parser->resolveOperand(paddingInfo[0], paddingType,
result->operands)) ||
parser->addTypeToList(vectorType, result->types);
}
@ -204,7 +181,7 @@ bool VectorTransferReadOp::verify() {
// Consistency of vector type in function type.
if (!getResult()->getType().isa<VectorType>()) {
return emitOpError("should have a vector result type in function type: "
"(memref_type [, elemental_type]) -> vector_type");
"memref_type<...xelemental_type>, vector_type");
}
// Consistency of elemental types in memref and vector.
MemRefType memrefType = getMemRefType();
@ -219,8 +196,9 @@ bool VectorTransferReadOp::verify() {
(optionalPaddingValue ? 1 : 0);
// Checks on the actual operands and their types.
if (getNumOperands() != expectedNumOperands) {
return emitOpError("expects " + Twine(expectedNumOperands) +
" operands to match the types");
return emitOpError("expects " + Twine(expectedNumOperands) + " operands " +
"(of which " + Twine(memrefType.getRank()) +
" indices)");
}
// Consistency of padding value with vector type.
if (optionalPaddingValue) {
@ -239,7 +217,7 @@ bool VectorTransferReadOp::verify() {
for (auto *idx : getIndices()) {
if (!idx->getType().isIndex()) {
return emitOpError(
"index to vector_transfer_read must have 'index' type");
"index to vector.transfer_read must have 'index' type");
}
++numIndices;
}
@ -301,63 +279,41 @@ void VectorTransferWriteOp::print(OpAsmPrinter *p) {
*p << getOperationName();
*p << " " << *getVector();
*p << ", " << *getMemRef();
*p << ", ";
*p << "[";
p->printOperands(getIndices());
*p << "]";
p->printOptionalAttrDict(getAttrs());
Type indexType = (*getIndices().begin())->getType();
*p << " : ";
p->printType(getVectorType());
*p << ", ";
p->printType(getMemRefType());
for (unsigned r = 0, n = getMemRefType().getRank(); r < n; ++r) {
*p << ", ";
p->printType(indexType);
}
}
bool VectorTransferWriteOp::parse(OpAsmParser *parser, OperationState *result) {
SmallVector<OpAsmParser::OperandType, 8> parsedOperands;
SmallVector<Type, 8> types;
// Parsing with support for optional paddingValue.
auto fail = parser->parseOperandList(parsedOperands) ||
parser->parseOptionalAttributeDict(result->attributes) ||
parser->parseColonTypeList(types);
if (fail) {
OpAsmParser::OperandType storeValueInfo;
OpAsmParser::OperandType memrefInfo;
SmallVector<OpAsmParser::OperandType, 4> indexInfo;
SmallVector<Type, 2> types;
auto indexType = parser->getBuilder().getIndexType();
if (parser->parseOperand(storeValueInfo) || parser->parseComma() ||
parser->parseOperand(memrefInfo) ||
parser->parseOperandList(indexInfo, -1, OpAsmParser::Delimiter::Square) ||
parser->parseOptionalAttributeDict(result->attributes) ||
parser->parseColonTypeList(types))
return true;
}
// Resolution.
if (parsedOperands.size() != types.size())
return parser->emitError(
parser->getNameLoc(),
"requires number of operands and input types to match");
if (parsedOperands.size() < Offsets::FirstIndexOffset)
return parser->emitError(parser->getNameLoc(),
"requires at least vector and memref operands");
if (types.size() != 2)
return parser->emitError(parser->getNameLoc(), "expected 2 types");
VectorType vectorType = types[Offsets::VectorOffset].dyn_cast<VectorType>();
if (!vectorType)
return parser->emitError(parser->getNameLoc(),
"Vector type expected for first input type");
return parser->emitError(parser->getNameLoc(), "vector type expected");
MemRefType memrefType = types[Offsets::MemRefOffset].dyn_cast<MemRefType>();
if (!memrefType)
return parser->emitError(parser->getNameLoc(),
"MemRef type expected for second input type");
return parser->emitError(parser->getNameLoc(), "memRef type expected");
unsigned expectedNumOperands =
Offsets::FirstIndexOffset + memrefType.getRank();
if (parsedOperands.size() != expectedNumOperands)
return parser->emitError(parser->getNameLoc(),
"requires " + Twine(expectedNumOperands) +
" operands");
OpAsmParser::OperandType vectorInfo = parsedOperands[Offsets::VectorOffset];
OpAsmParser::OperandType memrefInfo = parsedOperands[Offsets::MemRefOffset];
SmallVector<OpAsmParser::OperandType, 8> indexInfo{
parsedOperands.begin() + Offsets::FirstIndexOffset, parsedOperands.end()};
auto indexType = parser->getBuilder().getIndexType();
return parser->resolveOperand(vectorInfo, vectorType, result->operands) ||
parser->resolveOperand(memrefInfo, memrefType, result->operands) ||
return parser->resolveOperands(storeValueInfo, vectorType,
result->operands) ||
parser->resolveOperands(memrefInfo, memrefType, result->operands) ||
parser->resolveOperands(indexInfo, indexType, result->operands);
}
@ -386,15 +342,16 @@ bool VectorTransferWriteOp::verify() {
Offsets::FirstIndexOffset + memrefType.getRank();
// Checks on the actual operands and their types.
if (getNumOperands() != expectedNumOperands) {
return emitOpError("expects " + Twine(expectedNumOperands) +
" operands to match the types");
return emitOpError("expects " + Twine(expectedNumOperands) + " operands " +
"(of which " + Twine(memrefType.getRank()) +
" indices)");
}
// Consistency of indices types.
unsigned numIndices = 0;
for (auto *idx : getIndices()) {
if (!idx->getType().isIndex()) {
return emitOpError(
"index to vector_transfer_write must have 'index' type");
"index to vector.transfer_write must have 'index' type");
}
numIndices++;
}

View File

@ -565,10 +565,10 @@ TEST_FUNC(vectorize_2d) {
// CHECK-NEXT: affine.for %i0 = 0 to (d0) -> (d0)(%[[M]]) {
// CHECK-NEXT: affine.for %i1 = 0 to (d0) -> (d0)(%[[N]]) step 4 {
// CHECK-NEXT: affine.for %i2 = 0 to (d0) -> (d0)(%[[P]]) step 4 {
// CHECK-NEXT: %[[vA:.*]] = "vector_transfer_read"(%arg1, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (memref<?x?x?xf32>, index, index, index) -> vector<4x4xf32>
// CHECK-NEXT: %[[vB:.*]] = "vector_transfer_read"(%arg0, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (memref<?x?x?xf32>, index, index, index) -> vector<4x4xf32>
// CHECK-NEXT: %[[vA:.*]] = "vector.transfer_read"(%arg1, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (memref<?x?x?xf32>, index, index, index) -> vector<4x4xf32>
// CHECK-NEXT: %[[vB:.*]] = "vector.transfer_read"(%arg0, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (memref<?x?x?xf32>, index, index, index) -> vector<4x4xf32>
// CHECK-NEXT: %[[vRES:.*]] = addf %[[vB]], %[[vA]] : vector<4x4xf32>
// CHECK-NEXT: "vector_transfer_write"(%[[vRES:.*]], %arg2, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (vector<4x4xf32>, memref<?x?x?xf32>, index, index, index) -> ()
// CHECK-NEXT: "vector.transfer_write"(%[[vRES:.*]], %arg2, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (vector<4x4xf32>, memref<?x?x?xf32>, index, index, index) -> ()
// clang-format on
mlir::PassManager pm;

View File

@ -330,22 +330,22 @@ func @test_dimop(%arg0: tensor<4x4x?xf32>) {
}
// CHECK-LABEL: func @test_vector_transfer_ops(%arg0
func @test_vector_transfer_ops(%arg0: memref<?x?xf32>) {
// CHECK-LABEL: func @test_vector.transfer_ops(%arg0
func @test_vector.transfer_ops(%arg0: memref<?x?xf32>) {
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// CHECK: %0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: #[[map_proj_d0d1_d0]]} : (memref<?x?xf32>, index, index) -> vector<128xf32>
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
// CHECK: %1 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: #[[map_proj_d0d1_d1d0]]} : (memref<?x?xf32>, index, index) -> vector<3x7xf32>
%1 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d1, d0)} : (memref<?x?xf32>, index, index) -> vector<3x7xf32>
// CHECK: %2 = vector_transfer_read %arg0, %c3, %c3, %cst {permutation_map: #[[map_proj_d0d1_d0]]} : (memref<?x?xf32>, index, index, f32) -> vector<128xf32>
%2 = vector_transfer_read %arg0, %c3, %c3, %cst {permutation_map: (d0, d1)->(d0)} : (memref<?x?xf32>, index, index, f32) -> vector<128xf32>
// CHECK: %3 = vector_transfer_read %arg0, %c3, %c3, %cst {permutation_map: #[[map_proj_d0d1_d1]]} : (memref<?x?xf32>, index, index, f32) -> vector<128xf32>
%3 = vector_transfer_read %arg0, %c3, %c3, %cst {permutation_map: (d0, d1)->(d1)} : (memref<?x?xf32>, index, index, f32) -> vector<128xf32>
// CHECK: %0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d0]]} : memref<?x?xf32>, vector<128xf32>
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : memref<?x?xf32>, vector<128xf32>
// CHECK: %1 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d1d0]]} : memref<?x?xf32>, vector<3x7xf32>
%1 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d1, d0)} : memref<?x?xf32>, vector<3x7xf32>
// CHECK: %2 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: #[[map_proj_d0d1_d0]]} : memref<?x?xf32>, vector<128xf32>
%2 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: (d0, d1)->(d0)} : memref<?x?xf32>, vector<128xf32>
// CHECK: %3 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
%3 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: (d0, d1)->(d1)} : memref<?x?xf32>, vector<128xf32>
//
// CHECK: vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: #[[map_proj_d0d1_d0]]} : vector<128xf32>, memref<?x?xf32>, index, index
vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
// CHECK: vector_transfer_write %1, %arg0, %c3, %c3 {permutation_map: #[[map_proj_d0d1_d1d0]]} : vector<3x7xf32>, memref<?x?xf32>, index, index
vector_transfer_write %1, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d1, d0)} : vector<3x7xf32>, memref<?x?xf32>, index, index
// CHECK: vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d0]]} : vector<128xf32>, memref<?x?xf32>
vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
// CHECK: vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d1d0]]} : vector<3x7xf32>, memref<?x?xf32>
vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d1, d0)} : vector<3x7xf32>, memref<?x?xf32>
return
}

View File

@ -296,180 +296,181 @@ func @func_with_ops(tensor<?xi1>, tensor<42xi32>, tensor<42xi32>) {
// -----
func @test_vector_transfer_read(memref<?x?xf32>) {
func @test_vector.transfer_read(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// expected-error@+1 {{expected 4 operand types but had 3}}
%0 = "vector_transfer_read"(%arg0, %c3, %c3, %c3) : (memref<?x?xf32>, index, index) -> vector<128xf32>
// expected-error@+1 {{expected 2 types}}
%0 = vector.transfer_read %arg0[%c3, %c3] : memref<?x?xf32>
}
// -----
func @test_vector_transfer_read(memref<?x?xf32>) {
func @test_vector.transfer_read(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// expected-error@+1 {{requires 3 operands}}
%0 = vector_transfer_read %arg0, %c3, %c3, %c3 : (memref<?x?xf32>, index, index) -> vector<128xf32>
// expected-error@+1 {{expected 2 indices to the memref}}
%0 = vector.transfer_read %arg0[%c3, %c3, %c3] : memref<?x?xf32>, vector<128xf32>
}
// -----
func @test_vector_transfer_read(memref<?x?xf32>) {
func @test_vector.transfer_read(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// expected-error@+1 {{requires an AffineMapAttr named 'permutation_map'}}
%0 = vector_transfer_read %arg0, %c3, %c3 : (memref<?x?xf32>, index, index) -> vector<128xf32>
%0 = vector.transfer_read %arg0[%c3, %c3] : memref<?x?xf32>, vector<128xf32>
}
// -----
func @test_vector_transfer_read(memref<?x?xf32>) {
func @test_vector.transfer_read(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// expected-error@+1 {{requires an AffineMapAttr named 'permutation_map'}}
%0 = vector_transfer_read %arg0, %c3, %c3 {perm: (d0)->(d0)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
%0 = vector.transfer_read %arg0[%c3, %c3] {perm: (d0)->(d0)} : memref<?x?xf32>, vector<128xf32>
}
// -----
func @test_vector_transfer_read(memref<?x?xf32>) {
func @test_vector.transfer_read(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// expected-error@+1 {{requires a permutation_map with input dims of the same rank as the memref type}}
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0)->(d0)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0)->(d0)} : memref<?x?xf32>, vector<128xf32>
}
// -----
func @test_vector_transfer_read(memref<?x?xf32>) {
func @test_vector.transfer_read(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// expected-error@+1 {{requires a permutation_map with result dims of the same rank as the vector type}}
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0, d1)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0, d1)} : memref<?x?xf32>, vector<128xf32>
}
// -----
func @test_vector_transfer_read(memref<?x?xf32>) {
func @test_vector.transfer_read(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0 + d1)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0 + d1)} : memref<?x?xf32>, vector<128xf32>
}
// -----
func @test_vector_transfer_read(memref<?x?xf32>) {
func @test_vector.transfer_read(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0 + 1)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0 + 1)} : memref<?x?xf32>, vector<128xf32>
}
// -----
func @test_vector_transfer_read(memref<?x?x?xf32>) {
func @test_vector.transfer_read(memref<?x?x?xf32>) {
^bb0(%arg0: memref<?x?x?xf32>):
%c3 = constant 3 : index
%cst = constant 3.0 : f32
// expected-error@+1 {{requires a permutation_map that is a permutation (found one dim used more than once)}}
%0 = vector_transfer_read %arg0, %c3, %c3, %c3 {permutation_map: (d0, d1, d2)->(d0, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<3x7xf32>
%0 = vector.transfer_read %arg0[%c3, %c3, %c3] {permutation_map: (d0, d1, d2)->(d0, d0)} : memref<?x?x?xf32>, vector<3x7xf32>
}
// -----
func @test_vector_transfer_write(memref<?x?xf32>) {
func @test_vector.transfer_write(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
// expected-error@+1 {{expected 5 operand types but had 4}}
%0 = "vector_transfer_write"(%cst, %arg0, %c3, %c3, %c3) : (vector<128xf32>, memref<?x?xf32>, index, index) -> ()
%0 = "vector.transfer_write"(%cst, %arg0, %c3, %c3, %c3) : (vector<128xf32>, memref<?x?xf32>, index, index) -> ()
}
// -----
func @test_vector_transfer_write(memref<?x?xf32>) {
func @test_vector.transfer_write(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
// expected-error@+1 {{requires number of operands and input types to match}}
vector_transfer_write %cst, %arg0, %c3, %c3, %c3 : vector<128xf32>, memref<?x?xf32>, index, index
// expected-error@+1 {{expects 4 operands (of which 2 indices)}}
vector.transfer_write %cst, %arg0[%c3, %c3, %c3] : vector<128xf32>, memref<?x?xf32>
}
// -----
func @test_vector_transfer_write(memref<?x?xf32>) {
func @test_vector.transfer_write(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
// expected-error@+1 {{requires an AffineMapAttr named 'permutation_map'}}
vector_transfer_write %cst, %arg0, %c3, %c3 : vector<128xf32>, memref<?x?xf32>, index, index
vector.transfer_write %cst, %arg0[%c3, %c3] : vector<128xf32>, memref<?x?xf32>
}
// -----
func @test_vector_transfer_write(memref<?x?xf32>) {
func @test_vector.transfer_write(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
// expected-error@+1 {{requires an AffineMapAttr named 'permutation_map'}}
vector_transfer_write %cst, %arg0, %c3, %c3 {perm: (d0)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
vector.transfer_write %cst, %arg0[%c3, %c3] {perm: (d0)->(d0)} : vector<128xf32>, memref<?x?xf32>
}
// -----
func @test_vector_transfer_write(memref<?x?xf32>) {
func @test_vector.transfer_write(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
// expected-error@+1 {{requires a permutation_map with input dims of the same rank as the memref type}}
vector_transfer_write %cst, %arg0, %c3, %c3 {permutation_map: (d0)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map: (d0)->(d0)} : vector<128xf32>, memref<?x?xf32>
}
// -----
func @test_vector_transfer_write(memref<?x?xf32>) {
func @test_vector.transfer_write(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
// expected-error@+1 {{requires a permutation_map with result dims of the same rank as the vector type}}
vector_transfer_write %cst, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0, d1)} : vector<128xf32>, memref<?x?xf32>, index, index
vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0, d1)} : vector<128xf32>, memref<?x?xf32>
}
// -----
func @test_vector_transfer_write(memref<?x?xf32>) {
func @test_vector.transfer_write(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
vector_transfer_write %cst, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0 + d1)} : vector<128xf32>, memref<?x?xf32>, index, index
vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0 + d1)} : vector<128xf32>, memref<?x?xf32>
}
// -----
func @test_vector_transfer_write(memref<?x?xf32>) {
func @test_vector.transfer_write(memref<?x?xf32>) {
^bb0(%arg0: memref<?x?xf32>):
%c3 = constant 3 : index
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
vector_transfer_write %cst, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0 + 1)} : vector<128xf32>, memref<?x?xf32>, index, index
vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0 + 1)} : vector<128xf32>, memref<?x?xf32>
}
// -----
func @test_vector_transfer_write(memref<?x?x?xf32>) {
func @test_vector.transfer_write(memref<?x?x?xf32>) {
^bb0(%arg0: memref<?x?x?xf32>):
%c3 = constant 3 : index
%cst = constant splat<vector<3 x 7 x f32>, 3.0> : vector<3 x 7 x f32>
// expected-error@+1 {{requires a permutation_map that is a permutation (found one dim used more than once)}}
vector_transfer_write %cst, %arg0, %c3, %c3, %c3 {permutation_map: (d0, d1, d2)->(d0, d0)} : vector<3x7xf32>, memref<?x?x?xf32>, index, index, index
vector.transfer_write %cst, %arg0[%c3, %c3, %c3] {permutation_map: (d0, d1, d2)->(d0, d0)} : vector<3x7xf32>, memref<?x?x?xf32>
}
// -----

View File

@ -8,13 +8,13 @@ func @materialize_read_1d() {
%A = alloc () : memref<7x42xf32>
affine.for %i0 = 0 to 7 step 4 {
affine.for %i1 = 0 to 42 step 4 {
%f1 = vector_transfer_read %A, %i0, %i1 {permutation_map: (d0, d1) -> (d0)} : (memref<7x42xf32>, index, index) -> vector<4xf32>
%f1 = vector.transfer_read %A[%i0, %i1] {permutation_map: (d0, d1) -> (d0)} : memref<7x42xf32>, vector<4xf32>
%ip1 = affine.apply (d0) -> (d0 + 1) (%i1)
%f2 = vector_transfer_read %A, %i0, %ip1 {permutation_map: (d0, d1) -> (d0)} : (memref<7x42xf32>, index, index) -> vector<4xf32>
%f2 = vector.transfer_read %A[%i0, %ip1] {permutation_map: (d0, d1) -> (d0)} : memref<7x42xf32>, vector<4xf32>
%ip2 = affine.apply (d0) -> (d0 + 2) (%i1)
%f3 = vector_transfer_read %A, %i0, %ip2 {permutation_map: (d0, d1) -> (d0)} : (memref<7x42xf32>, index, index) -> vector<4xf32>
%f3 = vector.transfer_read %A[%i0, %ip2] {permutation_map: (d0, d1) -> (d0)} : memref<7x42xf32>, vector<4xf32>
%ip3 = affine.apply (d0) -> (d0 + 3) (%i1)
%f4 = vector_transfer_read %A, %i0, %ip3 {permutation_map: (d0, d1) -> (d0)} : (memref<7x42xf32>, index, index) -> vector<4xf32>
%f4 = vector.transfer_read %A[%i0, %ip3] {permutation_map: (d0, d1) -> (d0)} : memref<7x42xf32>, vector<4xf32>
// Both accesses in the load must be clipped otherwise %i1 + 2 and %i1 + 3 will go out of bounds.
// CHECK: {{.*}} = select
// CHECK: %[[FILTERED1:.*]] = select
@ -34,9 +34,9 @@ func @materialize_read_1d_partially_specialized(%dyn1 : index, %dyn2 : index, %d
affine.for %i2 = 0 to %dyn2 {
affine.for %i3 = 0 to 42 step 2 {
affine.for %i4 = 0 to %dyn4 {
%f1 = vector_transfer_read %A, %i0, %i1, %i2, %i3, %i4 {permutation_map: (d0, d1, d2, d3, d4) -> (d3)} : ( memref<7x?x?x42x?xf32>, index, index, index, index, index) -> vector<4xf32>
%f1 = vector.transfer_read %A[%i0, %i1, %i2, %i3, %i4] {permutation_map: (d0, d1, d2, d3, d4) -> (d3)} : memref<7x?x?x42x?xf32>, vector<4xf32>
%i3p1 = affine.apply (d0) -> (d0 + 1) (%i3)
%f2 = vector_transfer_read %A, %i0, %i1, %i2, %i3p1, %i4 {permutation_map: (d0, d1, d2, d3, d4) -> (d3)} : ( memref<7x?x?x42x?xf32>, index, index, index, index, index) -> vector<4xf32>
%f2 = vector.transfer_read %A[%i0, %i1, %i2, %i3p1, %i4] {permutation_map: (d0, d1, d2, d3, d4) -> (d3)} : memref<7x?x?x42x?xf32>, vector<4xf32>
}
}
}
@ -63,7 +63,7 @@ func @materialize_read(%M: index, %N: index, %O: index, %P: index) {
// CHECK-NEXT: %[[D2:.*]] = dim %0, 2 : memref<?x?x?x?xf32>
// CHECK-NEXT: %[[D3:.*]] = dim %0, 3 : memref<?x?x?x?xf32>
// CHECK: %[[ALLOC:.*]] = alloc() : memref<5x4x3xf32>
// CHECK-NEXT: %[[VECTOR_VIEW:.*]] = vector_type_cast %[[ALLOC]] : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
// CHECK-NEXT: %[[VECTOR_VIEW:.*]] = vector.type_cast %[[ALLOC]] : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
// CHECK-NEXT: affine.for %[[I4:.*]] = 0 to 3 {
// CHECK-NEXT: affine.for %[[I5:.*]] = 0 to 4 {
// CHECK-NEXT: affine.for %[[I6:.*]] = 0 to 5 {
@ -121,7 +121,7 @@ func @materialize_read(%M: index, %N: index, %O: index, %P: index) {
affine.for %i1 = 0 to %N {
affine.for %i2 = 0 to %O {
affine.for %i3 = 0 to %P step 5 {
%f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32>
%f = vector.transfer_read %A[%i0, %i1, %i2, %i3] {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : memref<?x?x?x?xf32>, vector<5x4x3xf32>
}
}
}
@ -142,7 +142,7 @@ func @materialize_write(%M: index, %N: index, %O: index, %P: index) {
// CHECK-NEXT: %[[D2:.*]] = dim %0, 2 : memref<?x?x?x?xf32>
// CHECK-NEXT: %[[D3:.*]] = dim %0, 3 : memref<?x?x?x?xf32>
// CHECK: %[[ALLOC:.*]] = alloc() : memref<5x4x3xf32>
// CHECK-NEXT: %[[VECTOR_VIEW:.*]] = vector_type_cast {{.*}} : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
// CHECK-NEXT: %[[VECTOR_VIEW:.*]] = vector.type_cast {{.*}} : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
// CHECK: store %cst, {{.*}} : memref<1xvector<5x4x3xf32>>
// CHECK-NEXT: affine.for %[[I4:.*]] = 0 to 3 {
// CHECK-NEXT: affine.for %[[I5:.*]] = 0 to 4 {
@ -205,7 +205,7 @@ func @materialize_write(%M: index, %N: index, %O: index, %P: index) {
affine.for %i1 = 0 to %N step 4 {
affine.for %i2 = 0 to %O {
affine.for %i3 = 0 to %P step 5 {
vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index
vector.transfer_write %f1, %A[%i0, %i1, %i2, %i3] {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32>
}
}
}

View File

@ -18,18 +18,18 @@ func @materialize(%M : index, %N : index, %O : index, %P : index) {
// CHECK-NEXT: %[[b:[0-9]+]] = {{.*}}[[ID1]](%i1)
// CHECK-NEXT: %[[c:[0-9]+]] = {{.*}}[[ID1]](%i2)
// CHECK-NEXT: %[[d:[0-9]+]] = {{.*}}[[ID1]](%i3)
// CHECK-NEXT: vector_transfer_write {{.*}}, %0, %[[a]], %[[b]], %[[c]], %[[d]] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>, index, index, index, index
// CHECK-NEXT: vector.transfer_write {{.*}}, %0[%[[a]], %[[b]], %[[c]], %[[d]]] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>
// CHECK: %[[b1:[0-9]+]] = {{.*}}[[D0P1]](%i1)
// CHECK: vector_transfer_write {{.*}}, %0, {{.*}}, %[[b1]], {{.*}} {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>, index, index, index, index
// CHECK: vector.transfer_write {{.*}}, %0[{{.*}}, %[[b1]], {{.*}}] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>
// CHECK: %[[b2:[0-9]+]] = {{.*}}[[D0P2]](%i1)
// CHECK: vector_transfer_write {{.*}}, %0, {{.*}}, %[[b2]], {{.*}} {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>, index, index, index, index
// CHECK: vector.transfer_write {{.*}}, %0[{{.*}}, %[[b2]], {{.*}}] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>
// CHECK: %[[b3:[0-9]+]] = {{.*}}[[D0P3]](%i1)
// CHECK: vector_transfer_write {{.*}}, %0, {{.*}}, %[[b3]], {{.*}} {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>, index, index, index, index
// CHECK: vector.transfer_write {{.*}}, %0[{{.*}}, %[[b3]], {{.*}}] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>
affine.for %i0 = 0 to %M step 4 {
affine.for %i1 = 0 to %N step 4 {
affine.for %i2 = 0 to %O {
affine.for %i3 = 0 to %P step 4 {
"vector_transfer_write"(%f1, %A, %i0, %i1, %i2, %i3) {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : (vector<4x4x4xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index) -> ()
vector.transfer_write %f1, %A[%i0, %i1, %i2, %i3] {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<4x4x4xf32>, memref<?x?x?x?xf32, 0>
}
}
}

View File

@ -17,22 +17,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
// 4x unroll (jammed by construction).
// CHECK: affine.for %i0 = 0 to %arg0 {
// CHECK-NEXT: affine.for %i1 = 0 to %arg1 step 32 {
// CHECK-NEXT: [[CST0:%.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: [[CST1:%.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: [[CST2:%.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: [[CST3:%.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: vector_transfer_write [[CST0]], {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P8]]{{.*}}
// CHECK-NEXT: vector_transfer_write [[CST1]], {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL20:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: [[VAL21:%.*]] = affine.apply [[D0P16]]{{.*}}
// CHECK-NEXT: vector_transfer_write [[CST2]], {{.*}}, [[VAL20]], [[VAL21]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL30:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: [[VAL31:%.*]] = affine.apply [[D0P24]]{{.*}}
// CHECK-NEXT: vector_transfer_write [[CST3]], {{.*}}, [[VAL30]], [[VAL31]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: %[[CST0:.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: %[[CST1:.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: %[[CST2:.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: %[[CST3:.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: vector.transfer_write %[[CST0]], {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P8]]{{.*}}
// CHECK-NEXT: vector.transfer_write %[[CST1]], {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL20:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: %[[VAL21:.*]] = affine.apply [[D0P16]]{{.*}}
// CHECK-NEXT: vector.transfer_write %[[CST2]], {{.*}}[%[[VAL20]], %[[VAL21]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL30:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: %[[VAL31:.*]] = affine.apply [[D0P24]]{{.*}}
// CHECK-NEXT: vector.transfer_write %[[CST3]], {{.*}}[%[[VAL30]], %[[VAL31]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
//
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
@ -43,22 +43,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
// 4x unroll (jammed by construction).
// CHECK: affine.for %i2 = 0 to %arg0 {
// CHECK-NEXT: affine.for %i3 = 0 to %arg1 step 32 {
// CHECK-NEXT: [[CST0:%.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
// CHECK-NEXT: [[CST1:%.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
// CHECK-NEXT: [[CST2:%.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
// CHECK-NEXT: [[CST3:%.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: vector_transfer_write [[CST0]], {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P8]]{{.*}}
// CHECK-NEXT: vector_transfer_write [[CST1]], {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL20:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: [[VAL21:%.*]] = affine.apply [[D0P16]]{{.*}}
// CHECK-NEXT: vector_transfer_write [[CST2]], {{.*}}, [[VAL20]], [[VAL21]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL30:%.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: [[VAL31:%.*]] = affine.apply [[D0P24]]{{.*}}
// CHECK-NEXT: vector_transfer_write [[CST3]], {{.*}}, [[VAL30]], [[VAL31]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: %[[CST0:.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
// CHECK-NEXT: %[[CST1:.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
// CHECK-NEXT: %[[CST2:.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
// CHECK-NEXT: %[[CST3:.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: vector.transfer_write %[[CST0]], {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P8]]{{.*}}
// CHECK-NEXT: vector.transfer_write %[[CST1]], {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL20:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: %[[VAL21:.*]] = affine.apply [[D0P16]]{{.*}}
// CHECK-NEXT: vector.transfer_write %[[CST2]], {{.*}}[%[[VAL20]], %[[VAL21]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL30:.*]] = affine.apply [[ID1]]{{.*}}
// CHECK-NEXT: %[[VAL31:.*]] = affine.apply [[D0P24]]{{.*}}
// CHECK-NEXT: vector.transfer_write %[[CST3]], {{.*}}[%[[VAL30]], %[[VAL31]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
//
affine.for %i2 = 0 to %M {
affine.for %i3 = 0 to %N {
@ -71,44 +71,44 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
// CHECK-NEXT: affine.for %i5 = 0 to %arg1 step 32 {
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
//
affine.for %i4 = 0 to %M {
affine.for %i5 = 0 to %N {

View File

@ -23,24 +23,24 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
// CHECK-NEXT: {{.*}} = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: {{.*}} = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: {{.*}} = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]](%i0)
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]](%i1)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]](%i0)
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P8]](%i1)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL20:%.*]] = affine.apply [[D0P1]](%i0)
// CHECK-NEXT: [[VAL21:%.*]] = affine.apply [[ID1]](%i1)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL20]], [[VAL21]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL30:%.*]] = affine.apply [[D0P1]](%i0)
// CHECK-NEXT: [[VAL31:%.*]] = affine.apply [[D0P8]](%i1)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL30]], [[VAL31]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL40:%.*]] = affine.apply [[D0P2]](%i0)
// CHECK-NEXT: [[VAL41:%.*]] = affine.apply [[ID1]](%i1)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL40]], [[VAL41]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: [[VAL50:%.*]] = affine.apply [[D0P2]](%i0)
// CHECK-NEXT: [[VAL51:%.*]] = affine.apply [[D0P8]](%i1)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL50]], [[VAL51]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]](%i0)
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]](%i1)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]](%i0)
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P8]](%i1)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL20:.*]] = affine.apply [[D0P1]](%i0)
// CHECK-NEXT: %[[VAL21:.*]] = affine.apply [[ID1]](%i1)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL20]], %[[VAL21]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL30:.*]] = affine.apply [[D0P1]](%i0)
// CHECK-NEXT: %[[VAL31:.*]] = affine.apply [[D0P8]](%i1)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL30]], %[[VAL31]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL40:.*]] = affine.apply [[D0P2]](%i0)
// CHECK-NEXT: %[[VAL41:.*]] = affine.apply [[ID1]](%i1)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL40]], %[[VAL41]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL50:.*]] = affine.apply [[D0P2]](%i0)
// CHECK-NEXT: %[[VAL51:.*]] = affine.apply [[D0P8]](%i1)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL50]], %[[VAL51]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
// non-scoped %f1
@ -63,40 +63,40 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
// CHECK-NEXT: affine.for %i5 = 0 to %arg1 step 16 {
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
@ -105,22 +105,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
//
affine.for %i4 = 0 to %M {
affine.for %i5 = 0 to %N {

View File

@ -17,12 +17,12 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
// CHECK-NEXT: affine.for %i1 = 0 to %arg1 step 32 {
// CHECK-NEXT: {{.*}} = constant splat<vector<3x16xf32>, 1.000000e+00> : vector<3x16xf32>
// CHECK-NEXT: {{.*}} = constant splat<vector<3x16xf32>, 1.000000e+00> : vector<3x16xf32>
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]](%i0)
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]](%i1)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[ID2]]} : vector<3x16xf32>
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]](%i0)
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P16]](%i1)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[ID2]]} : vector<3x16xf32>
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]](%i0)
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]](%i1)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[ID2]]} : vector<3x16xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]](%i0)
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P16]](%i1)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[ID2]]} : vector<3x16xf32>, memref<?x?xf32>
//
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
@ -35,12 +35,12 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
// CHECK-NEXT: affine.for %i3 = 0 to %arg1 step 32 {
// CHECK-NEXT: {{.*}} = constant splat<vector<3x16xf32>, 2.000000e+00> : vector<3x16xf32>
// CHECK-NEXT: {{.*}} = constant splat<vector<3x16xf32>, 2.000000e+00> : vector<3x16xf32>
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]](%i2)
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]](%i3)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[ID2]]} : vector<3x16xf32>
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]](%i2)
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P16]](%i3)
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[ID2]]} : vector<3x16xf32>
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]](%i2)
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]](%i3)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[ID2]]} : vector<3x16xf32>, memref<?x?xf32>
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]](%i2)
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P16]](%i3)
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[ID2]]} : vector<3x16xf32>, memref<?x?xf32>
//
affine.for %i2 = 0 to %M {
affine.for %i3 = 0 to %N {
@ -53,24 +53,24 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
// CHECK-NEXT: affine.for %i5 = 0 to %arg1 step 32 {
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = vector_transfer_read
// CHECK-NEXT: {{.*}} = vector.transfer_read
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<3x16xf32>
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<3x16xf32>
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: {{.*}} = affine.apply
// CHECK-NEXT: vector_transfer_write
// CHECK-NEXT: vector.transfer_write
//
affine.for %i4 = 0 to %M {
affine.for %i5 = 0 to %N {

View File

@ -13,7 +13,7 @@
// Maps introduced to vectorize fastest varying memory index.
// CHECK-LABEL: func @vec1d_1
func @vec1d_1(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
@ -23,7 +23,7 @@ func @vec1d_1(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
%cst0 = constant 0 : index
//
// CHECK: for {{.*}} step 128
// CHECK-NEXT: {{.*}} = vector_transfer_read %arg0, [[C0]], [[C0]] {permutation_map: #[[map_proj_d0d1_0]]} : (memref<?x?xf32>, index, index) -> vector<128xf32>
// CHECK-NEXT: {{.*}} = vector.transfer_read %arg0[%[[C0]], %[[C0]]] {permutation_map: #[[map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i0 = 0 to %M { // vectorized due to scalar -> vector
%a0 = load %A[%cst0, %cst0] : memref<?x?xf32>
}
@ -32,7 +32,7 @@ func @vec1d_1(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-LABEL: func @vec1d_2
func @vec1d_2(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
@ -42,8 +42,8 @@ func @vec1d_2(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
%cst0 = constant 0 : index
//
// CHECK:for [[IV3:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128
// CHECK-NEXT: [[APP3:%[a-zA-Z0-9]+]] = affine.apply {{.*}}[[IV3]]
// CHECK-NEXT: {{.*}} = vector_transfer_read %arg0, [[C0]], [[APP3]] {permutation_map: #[[map_proj_d0d1_d1]]} : {{.*}} -> vector<128xf32>
// CHECK-NEXT: %[[APP3:[a-zA-Z0-9]+]] = affine.apply {{.*}}[[IV3]]
// CHECK-NEXT: {{.*}} = vector.transfer_read %arg0[%[[C0]], %[[APP3]]] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i3 = 0 to %M { // vectorized
%r3 = affine.apply (d0) -> (d0) (%i3)
%a3 = load %A[%cst0, %r3] : memref<?x?xf32>
@ -53,7 +53,7 @@ func @vec1d_2(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-LABEL: func @vec1d_3
func @vec1d_3(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
@ -64,9 +64,9 @@ func @vec1d_3(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
//
// CHECK:for [[IV8:%[i0-9]+]] = 0 to [[ARG_M]] step 128
// CHECK-NEXT: for [[IV9:%[i0-9]*]] = 0 to [[ARG_N]] {
// CHECK-NEXT: [[APP9_0:%[0-9]+]] = affine.apply {{.*}}([[IV8]], [[IV9]])
// CHECK-NEXT: [[APP9_1:%[0-9]+]] = affine.apply {{.*}}([[IV8]], [[IV9]])
// CHECK-NEXT: {{.*}} = vector_transfer_read %arg0, [[APP9_0]], [[APP9_1]] {permutation_map: #[[map_proj_d0d1_d1]]} : {{.*}} -> vector<128xf32>
// CHECK-NEXT: %[[APP9_0:[0-9]+]] = affine.apply {{.*}}([[IV8]], [[IV9]])
// CHECK-NEXT: %[[APP9_1:[0-9]+]] = affine.apply {{.*}}([[IV8]], [[IV9]])
// CHECK-NEXT: {{.*}} = vector.transfer_read %arg0[%[[APP9_0]], %[[APP9_1]]] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i8 = 0 to %M { // vectorized
affine.for %i9 = 0 to %N {
%r90 = affine.apply (d0, d1) -> (d1) (%i8, %i9)
@ -87,7 +87,7 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
// CHECK: [[C1:%.*]] = constant splat<vector<128xf32>, 1.000000e+00> : vector<128xf32>
// CHECK: vector_transfer_write [[C1]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>, index, index
// CHECK: vector.transfer_write [[C1]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>
// non-scoped %f1
store %f1, %A[%i0, %i1] : memref<?x?xf32, 0>
}
@ -95,22 +95,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
affine.for %i2 = 0 to %M {
affine.for %i3 = 0 to %N {
// CHECK: [[C3:%.*]] = constant splat<vector<128xf32>, 2.000000e+00> : vector<128xf32>
// CHECK: vector_transfer_write [[C3]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>, index, index
// CHECK: vector.transfer_write [[C3]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>
// non-scoped %f2
store %f2, %B[%i2, %i3] : memref<?x?xf32, 0>
}
}
affine.for %i4 = 0 to %M {
affine.for %i5 = 0 to %N {
// CHECK: [[A5:%.*]] = vector_transfer_read %0, {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : (memref<?x?xf32>, index, index) -> vector<128xf32>
// CHECK: [[B5:%.*]] = vector_transfer_read %1, {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : (memref<?x?xf32>, index, index) -> vector<128xf32>
// CHECK: [[A5:%.*]] = vector.transfer_read %0[{{.*}}] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
// CHECK: [[B5:%.*]] = vector.transfer_read %1[{{.*}}] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
// CHECK: [[S5:%.*]] = addf [[A5]], [[B5]] : vector<128xf32>
// CHECK: [[SPLAT1:%.*]] = constant splat<vector<128xf32>, 1.000000e+00> : vector<128xf32>
// CHECK: [[S6:%.*]] = addf [[S5]], [[SPLAT1]] : vector<128xf32>
// CHECK: [[SPLAT2:%.*]] = constant splat<vector<128xf32>, 2.000000e+00> : vector<128xf32>
// CHECK: [[S7:%.*]] = addf [[S5]], [[SPLAT2]] : vector<128xf32>
// CHECK: [[S8:%.*]] = addf [[S7]], [[S6]] : vector<128xf32>
// CHECK: vector_transfer_write [[S8]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>, index, index
// CHECK: vector.transfer_write [[S8]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>
%a5 = load %A[%i4, %i5] : memref<?x?xf32, 0>
%b5 = load %B[%i4, %i5] : memref<?x?xf32, 0>
%s5 = addf %a5, %b5 : f32
@ -179,9 +179,9 @@ func @vec_rejected_3(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
//
// CHECK:for [[IV4:%[i0-9]+]] = 0 to [[ARG_M]] step 128 {
// CHECK-NEXT: for [[IV5:%[i0-9]*]] = 0 to [[ARG_N]] {
// CHECK-NEXT: [[APP50:%[0-9]+]] = affine.apply {{.*}}([[IV4]], [[IV5]])
// CHECK-NEXT: [[APP51:%[0-9]+]] = affine.apply {{.*}}([[IV4]], [[IV5]])
// CHECK-NEXT: {{.*}} = vector_transfer_read %arg0, [[APP50]], [[APP51]] {permutation_map: #[[map_proj_d0d1_d1]]} : {{.*}} -> vector<128xf32>
// CHECK-NEXT: %[[APP50:[0-9]+]] = affine.apply {{.*}}([[IV4]], [[IV5]])
// CHECK-NEXT: %[[APP51:[0-9]+]] = affine.apply {{.*}}([[IV4]], [[IV5]])
// CHECK-NEXT: {{.*}} = vector.transfer_read %arg0[%[[APP50]], %[[APP51]]] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i4 = 0 to %M { // vectorized
affine.for %i5 = 0 to %N { // not vectorized, would vectorize with --test-fastest-varying=1
%r50 = affine.apply (d0, d1) -> (d1) (%i4, %i5)
@ -289,7 +289,7 @@ func @vec_rejected_7(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-LABEL: func @vec_rejected_8
func @vec_rejected_8(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
@ -300,7 +300,7 @@ func @vec_rejected_8(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
//
// CHECK: affine.for %i{{[0-9]*}} = 0 to %{{[0-9]*}} {
// CHECK: for [[IV18:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128
// CHECK: {{.*}} = vector_transfer_read %arg0, [[C0]], [[C0]] {permutation_map: #[[map_proj_d0d1_0]]} : {{.*}} -> vector<128xf32>
// CHECK: {{.*}} = vector.transfer_read %arg0[%[[C0]], %[[C0]]] {permutation_map: #[[map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %i18 in DFS post-order prevents vectorizing %i17
affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
%a18 = load %A[%cst0, %cst0] : memref<?x?xf32>
@ -311,7 +311,7 @@ func @vec_rejected_8(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-LABEL: func @vec_rejected_9
func @vec_rejected_9(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
@ -322,7 +322,7 @@ func @vec_rejected_9(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
//
// CHECK: affine.for %i{{[0-9]*}} = 0 to %{{[0-9]*}} {
// CHECK: for [[IV18:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128
// CHECK: {{.*}} = vector_transfer_read %arg0, [[C0]], [[C0]] {permutation_map: #[[map_proj_d0d1_0]]} : {{.*}} -> vector<128xf32>
// CHECK: {{.*}} = vector.transfer_read %arg0[%[[C0]], %[[C0]]] {permutation_map: #[[map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %i18 in DFS post-order prevents vectorizing %i17
affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
%a18 = load %A[%cst0, %cst0] : memref<?x?xf32>

View File

@ -22,7 +22,7 @@ func @vec2d(%A : memref<?x?x?xf32>) {
// affine.for %i0 = 0 to %0 {
// affine.for %i1 = 0 to %1 step 32 {
// affine.for %i2 = 0 to %2 step 256 {
// %3 = "vector_transfer_read"(%arg0, %i0, %i1, %i2) : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// %3 = "vector.transfer_read"(%arg0, %i0, %i1, %i2) : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
affine.for %i2 = 0 to %P {
@ -54,7 +54,7 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
// CHECK: [[C1:%.*]] = constant splat<vector<32x256xf32>, 1.000000e+00> : vector<32x256xf32>
// CHECK: vector_transfer_write [[C1]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>, index, index
// CHECK: vector.transfer_write [[C1]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>
// non-scoped %f1
store %f1, %A[%i0, %i1] : memref<?x?xf32, 0>
}
@ -62,22 +62,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
affine.for %i2 = 0 to %M {
affine.for %i3 = 0 to %N {
// CHECK: [[C3:%.*]] = constant splat<vector<32x256xf32>, 2.000000e+00> : vector<32x256xf32>
// CHECK: vector_transfer_write [[C3]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>, index, index
// CHECK: vector.transfer_write [[C3]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>
// non-scoped %f2
store %f2, %B[%i2, %i3] : memref<?x?xf32, 0>
}
}
affine.for %i4 = 0 to %M {
affine.for %i5 = 0 to %N {
// CHECK: [[A5:%.*]] = vector_transfer_read %0, {{.*}} {permutation_map: #[[map_id2]]} : (memref<?x?xf32>, index, index) -> vector<32x256xf32>
// CHECK: [[B5:%.*]] = vector_transfer_read %1, {{.*}} {permutation_map: #[[map_id2]]} : (memref<?x?xf32>, index, index) -> vector<32x256xf32>
// CHECK: [[A5:%.*]] = vector.transfer_read %0[{{.*}}] {permutation_map: #[[map_id2]]} : memref<?x?xf32>, vector<32x256xf32>
// CHECK: [[B5:%.*]] = vector.transfer_read %1[{{.*}}] {permutation_map: #[[map_id2]]} : memref<?x?xf32>, vector<32x256xf32>
// CHECK: [[S5:%.*]] = addf [[A5]], [[B5]] : vector<32x256xf32>
// CHECK: [[SPLAT1:%.*]] = constant splat<vector<32x256xf32>, 1.000000e+00> : vector<32x256xf32>
// CHECK: [[S6:%.*]] = addf [[S5]], [[SPLAT1]] : vector<32x256xf32>
// CHECK: [[SPLAT2:%.*]] = constant splat<vector<32x256xf32>, 2.000000e+00> : vector<32x256xf32>
// CHECK: [[S7:%.*]] = addf [[S5]], [[SPLAT2]] : vector<32x256xf32>
// CHECK: [[S8:%.*]] = addf [[S7]], [[S6]] : vector<32x256xf32>
// CHECK: vector_transfer_write [[S8]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>, index, index
// CHECK: vector.transfer_write [[S8]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>
//
%a5 = load %A[%i4, %i5] : memref<?x?xf32, 0>
%b5 = load %B[%i4, %i5] : memref<?x?xf32, 0>
@ -110,7 +110,7 @@ func @vectorize_matmul(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: me
// VECT: {{.*}} #[[map_id1]](%[[M]]) step 4 {
// VECT-NEXT: {{.*}} #[[map_id1]](%[[N]]) step 8 {
// VECT: %[[VC0:.*]] = constant splat<vector<4x8xf32>, 0.000000e+00> : vector<4x8xf32>
// VECT-NEXT: vector_transfer_write %[[VC0]], %arg2, %{{.*}}, %{{.*}} {permutation_map: #[[map_id2]]}
// VECT-NEXT: vector.transfer_write %[[VC0]], %arg2[%{{.*}}, %{{.*}}] {permutation_map: #[[map_id2]]} : vector<4x8xf32>, memref<?x?xf32>
affine.for %i0 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%M) {
affine.for %i1 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%N) {
%cst = constant 0.000000e+00 : f32
@ -120,12 +120,12 @@ func @vectorize_matmul(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: me
// VECT: affine.for %[[I2:.*]] = #[[map_id1]](%[[C0]]) to #[[map_id1]](%[[M]]) step 4 {
// VECT-NEXT: affine.for %[[I3:.*]] = #[[map_id1]](%[[C0]]) to #[[map_id1]](%[[N]]) step 8 {
// VECT-NEXT: affine.for %[[I4:.*]] = #map5(%[[C0]]) to #[[map_id1]](%[[K]]) {
// VECT-NEXT: %[[A:.*]] = vector_transfer_read %arg1, %[[I4]], %[[I3]] {permutation_map: #[[map_proj_d0d1_zerod1]]}
// VECT-NEXT: %[[B:.*]] = vector_transfer_read %arg0, %[[I2]], %[[I4]] {permutation_map: #[[map_proj_d0d1_d0zero]]}
// VECT-NEXT: %[[A:.*]] = vector.transfer_read %arg1[%[[I4]], %[[I3]]] {permutation_map: #[[map_proj_d0d1_zerod1]]} : memref<?x?xf32>, vector<4x8xf32>
// VECT-NEXT: %[[B:.*]] = vector.transfer_read %arg0[%[[I2]], %[[I4]]] {permutation_map: #[[map_proj_d0d1_d0zero]]} : memref<?x?xf32>, vector<4x8xf32>
// VECT-NEXT: %[[C:.*]] = mulf %[[B]], %[[A]] : vector<4x8xf32>
// VECT-NEXT: %[[D:.*]] = vector_transfer_read %arg2, %[[I2]], %[[I3]] {permutation_map: #[[map_id2]]}
// VECT-NEXT: %[[D:.*]] = vector.transfer_read %arg2[%[[I2]], %[[I3]]] {permutation_map: #[[map_id2]]} : memref<?x?xf32>, vector<4x8xf32>
// VECT-NEXT: %[[E:.*]] = addf %[[D]], %[[C]] : vector<4x8xf32>
// VECT-NEXT: vector_transfer_write %[[E]], %arg2, %[[I2]], %[[I3]] {permutation_map: #[[map_id2]]} : vector<4x8xf32>, memref<?x?xf32>, index, index
// VECT-NEXT: vector.transfer_write %[[E]], %arg2[%[[I2]], %[[I3]]] {permutation_map: #[[map_id2]]} : vector<4x8xf32>, memref<?x?xf32>
affine.for %i2 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%M) {
affine.for %i3 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%N) {
affine.for %i4 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%K) {

View File

@ -12,7 +12,7 @@ func @vec3d(%A : memref<?x?x?xf32>) {
// CHECK: affine.for %i2 = 0 to %0 step 32 {
// CHECK: affine.for %i3 = 0 to %1 step 64 {
// CHECK: affine.for %i4 = 0 to %2 step 256 {
// CHECK: %3 = vector_transfer_read %arg0, %i2, %i3, %i4 {permutation_map: #[[map_proj_d0d1d2_d0d1d2]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32>
// CHECK: %3 = vector.transfer_read %arg0[%i2, %i3, %i4] {permutation_map: #[[map_proj_d0d1d2_d0d1d2]]} : memref<?x?x?xf32>, vector<32x64x256xf32>
affine.for %t0 = 0 to %0 {
affine.for %t1 = 0 to %0 {
affine.for %i0 = 0 to %0 {

View File

@ -10,7 +10,7 @@ func @vec2d(%A : memref<?x?x?xf32>) {
// CHECK: affine.for %i0 = 0 to %0 step 32
// CHECK: affine.for %i1 = 0 to %1 {
// CHECK: affine.for %i2 = 0 to %2 step 256
// CHECK: {{.*}} = vector_transfer_read %arg0, %i0, %i1, %i2 {permutation_map: #[[map_proj_d0d1d2_d0d2]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// CHECK: {{.*}} = vector.transfer_read %arg0[%i0, %i1, %i2] {permutation_map: #[[map_proj_d0d1d2_d0d2]]} : memref<?x?x?xf32>, vector<32x256xf32>
affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {
affine.for %i2 = 0 to %P {

View File

@ -22,7 +22,7 @@ func @vec2d(%A : memref<?x?x?xf32>) {
// CHECK: affine.for %i3 = 0 to %0 step 32
// CHECK: affine.for %i4 = 0 to %1 step 256
// CHECK: affine.for %i5 = 0 to %2 {
// CHECK: {{.*}} = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// CHECK: {{.*}} = vector.transfer_read %arg0[%i4, %i5, %i3] {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
affine.for %i3 = 0 to %M {
affine.for %i4 = 0 to %N {
affine.for %i5 = 0 to %P {
@ -40,12 +40,12 @@ func @vec2d_imperfectly_nested(%A : memref<?x?x?xf32>) {
// CHECK: affine.for %i0 = 0 to %0 step 32 {
// CHECK: affine.for %i1 = 0 to %1 {
// CHECK: affine.for %i2 = 0 to %2 step 256 {
// CHECK: %3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// CHECK: %3 = vector.transfer_read %arg0[%i2, %i1, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
// CHECK: affine.for %i3 = 0 to %1 step 256 {
// CHECK: affine.for %i4 = 0 to %2 {
// CHECK: %4 = vector_transfer_read %arg0, %i3, %i4, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// CHECK: %4 = vector.transfer_read %arg0[%i3, %i4, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
// CHECK: affine.for %i5 = 0 to %2 {
// CHECK: %5 = vector_transfer_read %arg0, %i3, %i5, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// CHECK: %5 = vector.transfer_read %arg0[%i3, %i5, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
affine.for %i0 = 0 to %0 {
affine.for %i1 = 0 to %1 {
affine.for %i2 = 0 to %2 {

View File

@ -22,7 +22,7 @@ func @vec2d(%A : memref<?x?x?xf32>) {
// CHECK: affine.for %i3 = 0 to %0 step 32
// CHECK: affine.for %i4 = 0 to %1 {
// CHECK: affine.for %i5 = 0 to %2 step 256
// CHECK: {{.*}} = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// CHECK: {{.*}} = vector.transfer_read %arg0[%i4, %i5, %i3] {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
affine.for %i3 = 0 to %M {
affine.for %i4 = 0 to %N {
affine.for %i5 = 0 to %P {
@ -40,12 +40,12 @@ func @vec2d_imperfectly_nested(%A : memref<?x?x?xf32>) {
// CHECK: affine.for %i0 = 0 to %0 step 32 {
// CHECK: affine.for %i1 = 0 to %1 step 256 {
// CHECK: affine.for %i2 = 0 to %2 {
// CHECK: %3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// CHECK: %3 = vector.transfer_read %arg0[%i2, %i1, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
// CHECK: affine.for %i3 = 0 to %1 {
// CHECK: affine.for %i4 = 0 to %2 step 256 {
// CHECK: %4 = vector_transfer_read %arg0, %i3, %i4, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// CHECK: %4 = vector.transfer_read %arg0[%i3, %i4, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
// CHECK: affine.for %i5 = 0 to %2 step 256 {
// CHECK: %5 = vector_transfer_read %arg0, %i3, %i5, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
// CHECK: %5 = vector.transfer_read %arg0[%i3, %i5, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
affine.for %i0 = 0 to %0 {
affine.for %i1 = 0 to %1 {
affine.for %i2 = 0 to %2 {