forked from OSchip/llvm-project
Cleanup SuperVectorization dialect printing and parsing.
On the read side, ``` %3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: (d0, d1, d2)->(d2, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32> ``` becomes: ``` %3 = vector_transfer_read %arg0[%i2, %i1, %i0] {permutation_map: (d0, d1, d2)->(d2, d0)} : memref<?x?x?xf32>, vector<32x256xf32> ``` On the write side, ``` vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index ``` becomes ``` vector_transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32> ``` Documentation will be cleaned up in a followup commit that also extracts a proper .md from the top of the file comments. PiperOrigin-RevId: 241021879
This commit is contained in:
parent
a38792f7d1
commit
c9d5f3418a
|
@ -11,7 +11,7 @@ identifiers to enable powerful analysis and transformation. A symbolic
|
|||
identifier can be bound to an SSA value that is either an argument to the
|
||||
function, a value defined at the top level of that function (outside of all
|
||||
loops and if instructions), the result of a
|
||||
[`constant` operation](LangRef.md#'constant'-operation), or the result of an
|
||||
[`constant` operation](../LangRef.md#'constant'-operation), or the result of an
|
||||
[`affine.apply` operation](#'affine.apply'-operation) that recursively takes as
|
||||
arguments any symbolic identifiers. Dimensions may be bound not only to anything
|
||||
that a symbol is bound to, but also to induction variables of enclosing
|
||||
|
@ -30,7 +30,7 @@ operation ::= ssa-id `=` `affine.apply` affine-map dim-and-symbol-use-list
|
|||
```
|
||||
|
||||
The `affine.apply` instruction applies an
|
||||
[affine mapping](LangRef.md#affine-expressions) to a list of SSA values,
|
||||
[affine mapping](../LangRef.md#affine-expressions) to a list of SSA values,
|
||||
yielding a single SSA value. The number of dimension and symbol arguments to
|
||||
affine.apply must be equal to the respective number of dimensional and symbolic
|
||||
inputs to the affine mapping; the `affine.apply` instruction always returns one
|
||||
|
@ -64,7 +64,7 @@ The `affine.for` operation represents an affine loop nest. It has one region
|
|||
containing its body. This region must contain one block that terminates with
|
||||
[`affine.terminator`](#'affine.terminator"-operation). *Note:* when `affine.for`
|
||||
is printed in custom format, the terminator is omitted. The block has one
|
||||
argument of [`index`](LangRef.md#index-type) type that represents the induction
|
||||
argument of [`index`](../LangRef.md#index-type) type that represents the induction
|
||||
variable of the loop.
|
||||
|
||||
The `affine.for` operation executes its body a number of times iterating from a
|
||||
|
@ -122,7 +122,7 @@ space defined by an integer set (a conjunction of affine constraints). A single
|
|||
`affine.if` may end with an optional `else` clause.
|
||||
|
||||
The condition of the `affine.if` is represented by an
|
||||
[integer set](LangRef.md#integer-sets) (a conjunction of affine constraints),
|
||||
[integer set](../LangRef.md#integer-sets) (a conjunction of affine constraints),
|
||||
and the SSA values bound to the dimensions and symbols in the integer set. The
|
||||
[same restrictions](#restrictions-on-dimensions-and-symbols) hold for these SSA
|
||||
values as for all bindings of SSA values to dimensions and symbols.
|
||||
|
@ -167,7 +167,7 @@ loops ([`for`](#'for'-operation)) and branches ([`if`](#'if'-operation)). It
|
|||
unconditionally transmits the control flow to the successor of the operation
|
||||
enclosing the region.
|
||||
|
||||
*Rationale*: bodies of affine operations are [blocks](LangRef.md#block) that
|
||||
*Rationale*: bodies of affine operations are [blocks](../LangRef.md#block) that
|
||||
must have terminators. Loops and branches represent structured control flow and
|
||||
should not accept arbitrary branches as terminators.
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# SuperVector Dialect
|
||||
# Vector Dialect
|
||||
|
||||
This dialect provides mid-level abstraction for the MLIR super-vectorizer.
|
||||
|
||||
|
@ -8,12 +8,12 @@ This dialect provides mid-level abstraction for the MLIR super-vectorizer.
|
|||
|
||||
### Vector transfers {#vector-transfers}
|
||||
|
||||
#### `vector_transfer_read` operation {#'vector_transfer_read'-operation}
|
||||
#### `vector.transfer_read` operation {#'vector.transfer_read'-operation}
|
||||
|
||||
Syntax:
|
||||
|
||||
``` {.ebnf}
|
||||
operation ::= ssa-id `=` `vector_transfer_read` ssa-use-list `{` attribute-entry `} :` function-type
|
||||
operation ::= ssa-id `=` `vector.transfer_read` ssa-use-list `{` attribute-entry `} :` function-type
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
@ -25,9 +25,9 @@ Examples:
|
|||
for %i0 = 0 to %0 {
|
||||
affine.for %i1 = 0 to %1 step 256 {
|
||||
affine.for %i2 = 0 to %2 step 32 {
|
||||
%v = vector_transfer_read %A, %i0, %i1, %i2, %f0
|
||||
%v = vector.transfer_read %A[%i0, %i1, %i2], (%f0)
|
||||
{permutation_map: (d0, d1, d2) -> (d2, d1)} :
|
||||
(memref<?x?x?xf32>, index, index, f32) -> vector<32x256xf32>
|
||||
memref<?x?x?xf32>, vector<32x256xf32>
|
||||
}}}
|
||||
|
||||
// Read the slice `%A[%i0, %i1]` (i.e. the element `%A[%i0, %i1]`) into
|
||||
|
@ -35,36 +35,36 @@ for %i0 = 0 to %0 {
|
|||
// broadcast:
|
||||
for %i0 = 0 to %0 {
|
||||
affine.for %i1 = 0 to %1 {
|
||||
%3 = vector_transfer_read %A, %i0, %i1
|
||||
%3 = vector.transfer_read %A[%i0, %i1]
|
||||
{permutation_map: (d0, d1) -> (0)} :
|
||||
(memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
memref<?x?xf32>, vector<128xf32>
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `vector_transfer_read` performs a blocking read from a slice within a scalar
|
||||
[MemRef](#memref-type) supplied as its first operand into a
|
||||
[vector](#vector-type) of the same elemental type. The slice is further defined
|
||||
by a full-rank index within the MemRef, supplied as the operands `2 .. 1 +
|
||||
rank(memref)`. The permutation_map [attribute](#attributes) is an
|
||||
[affine-map](#affine-maps) which specifies the transposition on the slice to
|
||||
match the vector shape. The size of the slice is specified by the size of the
|
||||
vector, given as the return type. Optionally, an `ssa-value` of the same
|
||||
elemental type as the MemRef is provided as the last operand to specify padding
|
||||
in the case of out-of-bounds accesses. Absence of the optional padding value
|
||||
signifies the `vector_transfer_read` is statically guaranteed to remain within
|
||||
the MemRef bounds. This operation is called 'read' by opposition to 'load'
|
||||
because the super-vector granularity is generally not representable with a
|
||||
single hardware register. A `vector_transfer_read` is thus a mid-level
|
||||
The `vector.transfer_read` performs a blocking read from a slice within a scalar
|
||||
[MemRef](../LangRef.md#memref-type) supplied as its first operand into a
|
||||
[vector](../LangRef.md#vector-type) of the same elemental type. The slice is
|
||||
further defined by a full-rank index within the MemRef, supplied as the operands
|
||||
`2 .. 1 + rank(memref)`. The permutation_map [attribute](../LangRef.md#attributes)
|
||||
is an [affine-map](../LangRef.md#affine-maps) which specifies the transposition on
|
||||
the slice to match the vector shape. The size of the slice is specified by the
|
||||
size of the vector, given as the return type. Optionally, an `ssa-value` of the
|
||||
same elemental type as the MemRef is provided as the last operand to specify
|
||||
padding in the case of out-of-bounds accesses. Absence of the optional padding
|
||||
value signifies the `vector.transfer_read` is statically guaranteed to remain
|
||||
within the MemRef bounds. This operation is called 'read' by opposition to
|
||||
'load' because the super-vector granularity is generally not representable with
|
||||
a single hardware register. A `vector.transfer_read` is thus a mid-level
|
||||
abstraction that supports super-vectorization with non-effecting padding for
|
||||
full-tile-only code.
|
||||
|
||||
More precisely, let's dive deeper into the permutation_map for the following :
|
||||
|
||||
```mlir {.mlir}
|
||||
vector_transfer_read %A, %expr1, %expr2, %expr3, %expr4
|
||||
vector.transfer_read %A[%expr1, %expr2, %expr3, %expr4]
|
||||
{ permutation_map : (d0,d1,d2,d3) -> (d2,0,d0) } :
|
||||
(memref<?x?x?x?xf32>, index, index, index, index) -> vector<3x4x5xf32>
|
||||
memref<?x?x?x?xf32>, vector<3x4x5xf32>
|
||||
```
|
||||
|
||||
This operation always reads a slice starting at `%A[%expr1, %expr2, %expr3,
|
||||
|
@ -74,7 +74,7 @@ This operation always reads a slice starting at `%A[%expr1, %expr2, %expr3,
|
|||
That slice needs to be read into a `vector<3x4x5xf32>`. Since the permutation
|
||||
map is not full rank, there must be a broadcast along vector dimension `1`.
|
||||
|
||||
A notional lowering of vector_transfer_read could generate code resembling:
|
||||
A notional lowering of vector.transfer_read could generate code resembling:
|
||||
|
||||
```mlir {.mlir}
|
||||
// %expr1, %expr2, %expr3, %expr4 defined before this point
|
||||
|
@ -117,12 +117,12 @@ the same amount of data as the `3 * 5` values transferred. An additional `1`
|
|||
broadcast is required. On a GPU this broadcast could be implemented using a
|
||||
warp-shuffle if loop `j` were mapped to `threadIdx.x`.
|
||||
|
||||
#### `vector_transfer_write` operation {#'vector_transfer_write'-operation}
|
||||
#### `vector.transfer_write` operation {#'vector.transfer_write'-operation}
|
||||
|
||||
Syntax:
|
||||
|
||||
``` {.ebnf}
|
||||
operation ::= `vector_transfer_write` ssa-use-list `{` attribute-entry `} :` vector-type ', ' memref-type ', ' index-type-list
|
||||
operation ::= `vector.transfer_write` ssa-use-list `{` attribute-entry `} :` vector-type ', ' memref-type ', ' index-type-list
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
@ -134,45 +134,46 @@ for %i0 = 0 to %0 {
|
|||
affine.for %i2 = 0 to %2 step 64 {
|
||||
affine.for %i3 = 0 to %3 step 16 {
|
||||
%val = `ssa-value` : vector<16x32x64xf32>
|
||||
vector_transfer_write %val, %A, %i0, %i1, %i2, %i3
|
||||
vector.transfer_write %val, %A[%i0, %i1, %i2, %i3]
|
||||
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
|
||||
vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index
|
||||
vector<16x32x64xf32>, memref<?x?x?x?xf32>
|
||||
}}}}
|
||||
```
|
||||
|
||||
The `vector_transfer_write` performs a blocking write from a
|
||||
[vector](#vector-type), supplied as its first operand, into a slice within a
|
||||
scalar [MemRef](#memref-type) of the same elemental type, supplied as its second
|
||||
operand. The slice is further defined by a full-rank index within the MemRef,
|
||||
supplied as the operands `3 .. 2 + rank(memref)`. The permutation_map
|
||||
[attribute](#attributes) is an [affine-map](#affine-maps) which specifies the
|
||||
transposition on the slice to match the vector shape. The size of the slice is
|
||||
specified by the size of the vector. This operation is called 'write' by
|
||||
opposition to 'store' because the super-vector granularity is generally not
|
||||
representable with a single hardware register. A `vector_transfer_write` is thus
|
||||
a mid-level abstraction that supports super-vectorization with non-effecting
|
||||
padding for full-tile-only code. It is the responsibility of
|
||||
`vector_transfer_write`'s implementation to ensure the memory writes are valid.
|
||||
Different lowerings may be pertinent depending on the hardware support.
|
||||
The `vector.transfer_write` performs a blocking write from a
|
||||
[vector](../LangRef.md#vector-type), supplied as its first operand, into a slice
|
||||
within a scalar [MemRef](../LangRef.md#memref-type) of the same elemental type,
|
||||
supplied as its second operand. The slice is further defined by a full-rank
|
||||
index within the MemRef, supplied as the operands `3 .. 2 + rank(memref)`. The
|
||||
permutation_map [attribute](../LangRef.md#attributes) is an
|
||||
[affine-map](../LangRef.md#affine-maps) which specifies the transposition on the
|
||||
slice to match the vector shape. The size of the slice is specified by the size
|
||||
of the vector. This operation is called 'write' by opposition to 'store' because
|
||||
the super-vector granularity is generally not representable with a single
|
||||
hardware register. A `vector.transfer_write` is thus a mid-level abstraction
|
||||
that supports super-vectorization with non-effecting padding for full-tile-only
|
||||
code. It is the responsibility of `vector.transfer_write`'s implementation to
|
||||
ensure the memory writes are valid. Different lowerings may be pertinent
|
||||
depending on the hardware support.
|
||||
|
||||
### Vector views {#vector-views}
|
||||
|
||||
#### `vector_type_cast` operation {#'vector_type_cast'-operation}
|
||||
#### `vector.type_cast` operation {#'vector.type_cast'-operation}
|
||||
|
||||
Syntax:
|
||||
|
||||
``` {.ebnf}
|
||||
operation ::= `vector_type_cast` ssa-use : memref-type, memref-type
|
||||
operation ::= `vector.type_cast` ssa-use : memref-type, memref-type
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
||||
```mlir
|
||||
%A = alloc() : memref<5x4x3xf32>
|
||||
%VA = vector_type_cast %A : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
|
||||
%VA = vector.type_cast %A : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
|
||||
```
|
||||
|
||||
The `vector_type_cast` operation performs a conversion from a memref with scalar
|
||||
The `vector.type_cast` operation performs a conversion from a memref with scalar
|
||||
element to memref with a *single* vector element, copying the shape of the
|
||||
memref to the vector. This is the minimal viable operation that is required to
|
||||
make super-vectorization operational. It can be seen as a special case of the
|
|
@ -2211,7 +2211,7 @@ a conversion pass.
|
|||
Currently, MLIR supports the following dialects:
|
||||
|
||||
* [Standard dialect](#standard-operations)
|
||||
* [SuperVector dialect](Dialects/SuperVector.md)
|
||||
* [Vector dialect](Dialects/Vector.md)
|
||||
* [TensorFlow dialect](#tensorflow-operations)
|
||||
|
||||
### TensorFlow operations {#tensorflow-operations}
|
||||
|
|
|
@ -89,13 +89,13 @@ shapeRatio(VectorType superVectorType, VectorType subVectorType);
|
|||
/// affine.for %i3 = 0 to %0 step 32 {
|
||||
/// affine.for %i4 = 0 to %1 {
|
||||
/// affine.for %i5 = 0 to %2 step 256 {
|
||||
/// %4 = vector_transfer_read %arg0, %i4, %i5, %i3
|
||||
/// %4 = vector.transfer_read %arg0, %i4, %i5, %i3
|
||||
/// {permutation_map: (d0, d1, d2) -> (d2, d1)} :
|
||||
/// (memref<?x?x?xf32>, index, index) -> vector<32x256xf32>
|
||||
/// }}}
|
||||
/// ```
|
||||
///
|
||||
/// Meaning that vector_transfer_read will be responsible for reading the slice:
|
||||
/// Meaning that vector.transfer_read will be responsible for reading the slice:
|
||||
/// `%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>.
|
||||
///
|
||||
/// Example 2:
|
||||
|
@ -112,13 +112,13 @@ shapeRatio(VectorType superVectorType, VectorType subVectorType);
|
|||
///
|
||||
/// ```mlir
|
||||
/// affine.for %i0 = 0 to %0 step 128 {
|
||||
/// %3 = vector_transfer_read %arg0, %c0_0, %c0_0
|
||||
/// %3 = vector.transfer_read %arg0, %c0_0, %c0_0
|
||||
/// {permutation_map: (d0, d1) -> (0)} :
|
||||
/// (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
/// }
|
||||
/// ````
|
||||
///
|
||||
/// Meaning that vector_transfer_read will be responsible of reading the slice
|
||||
/// Meaning that vector.transfer_read will be responsible of reading the slice
|
||||
/// `%arg0[%c0, %c0]` into vector<128xf32> which needs a 1-D vector broadcast.
|
||||
///
|
||||
AffineMap makePermutationMap(
|
||||
|
@ -127,7 +127,7 @@ AffineMap makePermutationMap(
|
|||
|
||||
namespace matcher {
|
||||
|
||||
/// Matches vector_transfer_read, vector_transfer_write and ops that return a
|
||||
/// Matches vector.transfer_read, vector.transfer_write and ops that return a
|
||||
/// vector type that is a multiple of the sub-vector type. This allows passing
|
||||
/// over other smaller vector types in the function and avoids interfering with
|
||||
/// operations on those.
|
||||
|
@ -135,7 +135,7 @@ namespace matcher {
|
|||
/// TODO(ntv): this could all be much simpler if we added a bit that a vector
|
||||
/// type to mark that a vector is a strict super-vector but it still does not
|
||||
/// warrant adding even 1 extra bit in the IR for now.
|
||||
bool operatesOnSuperVectors(Operation &op, VectorType subVectorType);
|
||||
bool operatesOnSuperVectorsOf(Operation &op, VectorType subVectorType);
|
||||
|
||||
} // end namespace matcher
|
||||
} // end namespace mlir
|
||||
|
|
|
@ -26,7 +26,7 @@
|
|||
#include "mlir/AffineOps/AffineOps.h"
|
||||
#include "mlir/IR/Builders.h"
|
||||
#include "mlir/StandardOps/Ops.h"
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
|
||||
namespace mlir {
|
||||
|
||||
|
|
|
@ -325,7 +325,7 @@ protected:
|
|||
/// ```mlir
|
||||
/// Stmt scalarValue, vectorValue, tmpAlloc, tmpDealloc, vectorView;
|
||||
/// tmpAlloc = alloc(tmpMemRefType);
|
||||
/// vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
|
||||
/// vectorView = vector.type_cast(tmpAlloc, vectorMemRefType),
|
||||
/// vectorValue = load(vectorView, zero),
|
||||
/// tmpDealloc = dealloc(tmpAlloc)});
|
||||
/// emitter.emitStmts({tmpAlloc, vectorView, vectorValue, tmpDealloc});
|
||||
|
@ -391,7 +391,7 @@ protected:
|
|||
/// Stmt scalarValue, vectorValue, tmpAlloc, tmpDealloc, vectorView;
|
||||
/// Stmt block = Block({
|
||||
/// tmpAlloc = alloc(tmpMemRefType),
|
||||
/// vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
|
||||
/// vectorView = vector.type_cast(tmpAlloc, vectorMemRefType),
|
||||
/// For(ivs, lbs, ubs, steps, {
|
||||
/// scalarValue = load(scalarMemRef,
|
||||
/// accessInfo.clippedScalarAccessExprs), store(scalarValue, tmpAlloc,
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
//===- SuperVectorOps.h - MLIR Super Vectorizer Operations ------*- C++ -*-===//
|
||||
//===- VectorOps.h - MLIR Super Vectorizer Operations -----------*- C++ -*-===//
|
||||
//
|
||||
// Copyright 2019 The MLIR Authors.
|
||||
//
|
||||
|
@ -20,8 +20,8 @@
|
|||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#ifndef MLIR_INCLUDE_MLIR_SUPERVECTOROPS_SUPERVECTOROPS_H
|
||||
#define MLIR_INCLUDE_MLIR_SUPERVECTOROPS_SUPERVECTOROPS_H
|
||||
#ifndef MLIR_VECTOROPS_VECTOROPS_H
|
||||
#define MLIR_VECTOROPS_VECTOROPS_H
|
||||
|
||||
#include "mlir/IR/Attributes.h"
|
||||
#include "mlir/IR/Dialect.h"
|
||||
|
@ -31,9 +31,9 @@
|
|||
namespace mlir {
|
||||
|
||||
/// Dialect for super-vectorization Ops.
|
||||
class SuperVectorOpsDialect : public Dialect {
|
||||
class VectorOpsDialect : public Dialect {
|
||||
public:
|
||||
SuperVectorOpsDialect(MLIRContext *context);
|
||||
VectorOpsDialect(MLIRContext *context);
|
||||
};
|
||||
|
||||
/// VectorTransferReadOp performs a blocking read from a scalar memref
|
||||
|
@ -70,14 +70,12 @@ public:
|
|||
/// ...
|
||||
/// %val = `ssa-value` : f32
|
||||
/// // let %i, %j, %k, %l be ssa-values of type index
|
||||
/// %v0 = vector_transfer_read %src, %i, %j, %k, %l
|
||||
/// %v0 = vector.transfer_read %src[%i, %j, %k, %l]
|
||||
/// {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
|
||||
/// (memref<?x?x?x?xf32>, index, index, index, index) ->
|
||||
/// vector<16x32x64xf32>
|
||||
/// %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
|
||||
/// memref<?x?x?x?xf32>, vector<16x32x64xf32>
|
||||
/// %v1 = vector.transfer_read %src[%i, %j, %k, %l], (%val)
|
||||
/// {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
|
||||
/// (memref<?x?x?x?xf32>, index, index, index, index, f32) ->
|
||||
/// vector<16x32x64xf32>
|
||||
/// memref<?x?x?x?xf32>, vector<16x32x64xf32>
|
||||
/// ```
|
||||
///
|
||||
/// Example with partial rank permutation_map:
|
||||
|
@ -86,10 +84,9 @@ public:
|
|||
/// %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
|
||||
/// ...
|
||||
/// // let %i, %j be ssa-values of type index
|
||||
/// %v0 = vector_transfer_read %src, %i, %c0, %c0, %c0
|
||||
/// %v0 = vector.transfer_read %src[%i, %c0, %c0, %c0]
|
||||
/// {permutation_map: (d0, d1, d2, d3) -> (0, d1, 0)} :
|
||||
/// (memref<?x?x?x?xf32>, index, index, index, index) ->
|
||||
/// vector<16x32x64xf32>
|
||||
/// memref<?x?x?x?xf32>, vector<16x32x64xf32>
|
||||
class VectorTransferReadOp
|
||||
: public Op<VectorTransferReadOp, OpTrait::VariadicOperands,
|
||||
OpTrait::OneResult> {
|
||||
|
@ -98,7 +95,7 @@ class VectorTransferReadOp
|
|||
public:
|
||||
using Op::Op;
|
||||
|
||||
static StringRef getOperationName() { return "vector_transfer_read"; }
|
||||
static StringRef getOperationName() { return "vector.transfer_read"; }
|
||||
static StringRef getPermutationMapAttrName() { return "permutation_map"; }
|
||||
static void build(Builder *builder, OperationState *result,
|
||||
VectorType vectorType, Value *srcMemRef,
|
||||
|
@ -133,7 +130,7 @@ public:
|
|||
///
|
||||
/// A vector transfer write has semantics similar to a vector store, with
|
||||
/// additional support for handling out-of-bounds situations. It is the
|
||||
/// responsibility of vector_transfer_write's implementation to ensure the
|
||||
/// responsibility of vector.transfer_write's implementation to ensure the
|
||||
/// memory writes are valid. Different implementations may be pertinent
|
||||
/// depending on the hardware support including:
|
||||
/// 1. predication;
|
||||
|
@ -146,9 +143,9 @@ public:
|
|||
/// %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
|
||||
/// %val = `ssa-value` : vector<16x32x64xf32>
|
||||
/// // let %i, %j, %k, %l be ssa-values of type index
|
||||
/// vector_transfer_write %val, %src, %i, %j, %k, %l
|
||||
/// vector.transfer_write %val, %src[%i, %j, %k, %l]
|
||||
/// {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
|
||||
/// vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index
|
||||
/// vector<16x32x64xf32>, memref<?x?x?x?xf32>
|
||||
/// ```
|
||||
class VectorTransferWriteOp
|
||||
: public Op<VectorTransferWriteOp, OpTrait::VariadicOperands,
|
||||
|
@ -162,7 +159,7 @@ class VectorTransferWriteOp
|
|||
public:
|
||||
using Op::Op;
|
||||
|
||||
static StringRef getOperationName() { return "vector_transfer_write"; }
|
||||
static StringRef getOperationName() { return "vector.transfer_write"; }
|
||||
static StringRef getPermutationMapAttrName() { return "permutation_map"; }
|
||||
static void build(Builder *builder, OperationState *result, Value *srcVector,
|
||||
Value *dstMemRef, ArrayRef<Value *> dstIndices,
|
||||
|
@ -190,14 +187,14 @@ public:
|
|||
///
|
||||
/// ```mlir
|
||||
/// %A = alloc() : memref<5x4x3xf32>
|
||||
/// %VA = vector_type_cast %A : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
|
||||
/// %VA = vector.type_cast %A : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
|
||||
/// ```
|
||||
class VectorTypeCastOp
|
||||
: public Op<VectorTypeCastOp, OpTrait::OneOperand, OpTrait::OneResult> {
|
||||
public:
|
||||
using Op::Op;
|
||||
|
||||
static StringRef getOperationName() { return "vector_type_cast"; }
|
||||
static StringRef getOperationName() { return "vector.type_cast"; }
|
||||
static void build(Builder *builder, OperationState *result, Value *srcVector,
|
||||
Type dstType);
|
||||
static bool parse(OpAsmParser *parser, OperationState *result);
|
||||
|
@ -207,4 +204,4 @@ public:
|
|||
|
||||
} // end namespace mlir
|
||||
|
||||
#endif // MLIR_INCLUDE_MLIR_SUPERVECTOROPS_SUPERVECTOROPS_H
|
||||
#endif // MLIR_VECTOROPS_VECTOROPS_H
|
|
@ -30,9 +30,9 @@
|
|||
#include "mlir/IR/Builders.h"
|
||||
#include "mlir/IR/Operation.h"
|
||||
#include "mlir/StandardOps/Ops.h"
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/Support/Functional.h"
|
||||
#include "mlir/Support/MathExtras.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
|
||||
#include "llvm/ADT/DenseSet.h"
|
||||
#include "llvm/ADT/SmallString.h"
|
||||
|
|
|
@ -22,9 +22,9 @@
|
|||
#include "mlir/IR/IntegerSet.h"
|
||||
#include "mlir/IR/Operation.h"
|
||||
#include "mlir/StandardOps/Ops.h"
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/Support/Functional.h"
|
||||
#include "mlir/Support/STLExtras.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
|
||||
#include "llvm/ADT/DenseSet.h"
|
||||
#include "llvm/ADT/SetVector.h"
|
||||
|
@ -185,11 +185,11 @@ AffineMap mlir::makePermutationMap(
|
|||
return ::makePermutationMap(store.getIndices(), enclosingLoopToVectorDim);
|
||||
}
|
||||
|
||||
bool mlir::matcher::operatesOnSuperVectors(Operation &op,
|
||||
VectorType subVectorType) {
|
||||
bool mlir::matcher::operatesOnSuperVectorsOf(Operation &op,
|
||||
VectorType subVectorType) {
|
||||
// First, extract the vector type and ditinguish between:
|
||||
// a. ops that *must* lower a super-vector (i.e. vector_transfer_read,
|
||||
// vector_transfer_write); and
|
||||
// a. ops that *must* lower a super-vector (i.e. vector.transfer_read,
|
||||
// vector.transfer_write); and
|
||||
// b. ops that *may* lower a super-vector (all other ops).
|
||||
// The ops that *may* lower a super-vector only do so if the super-vector to
|
||||
// sub-vector ratio exists. The ops that *must* lower a super-vector are
|
||||
|
@ -218,7 +218,7 @@ bool mlir::matcher::operatesOnSuperVectors(Operation &op,
|
|||
return false;
|
||||
}
|
||||
} else {
|
||||
// Not a vector_transfer and has more than 1 result, fail hard for now to
|
||||
// Not a vector.transfer and has more than 1 result, fail hard for now to
|
||||
// wake us up when something changes.
|
||||
op.emitError("NYI: operation has more than 1 result");
|
||||
return false;
|
||||
|
@ -229,7 +229,7 @@ bool mlir::matcher::operatesOnSuperVectors(Operation &op,
|
|||
|
||||
// Sanity check.
|
||||
assert((ratio.hasValue() || !mustDivide) &&
|
||||
"vector_transfer operation in which super-vector size is not an"
|
||||
"vector.transfer operation in which super-vector size is not an"
|
||||
" integer multiple of sub-vector size");
|
||||
|
||||
// This catches cases that are not strictly necessary to have multiplicity but
|
||||
|
|
|
@ -18,7 +18,7 @@
|
|||
#include "mlir/EDSC/Intrinsics.h"
|
||||
#include "mlir/EDSC/Builders.h"
|
||||
#include "mlir/IR/AffineExpr.h"
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
|
||||
using namespace mlir;
|
||||
using namespace mlir::edsc;
|
||||
|
|
|
@ -32,9 +32,9 @@
|
|||
#include "mlir/IR/Operation.h"
|
||||
#include "mlir/IR/Value.h"
|
||||
#include "mlir/StandardOps/Ops.h"
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/Support/Functional.h"
|
||||
#include "mlir/Support/STLExtras.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
|
||||
using llvm::dbgs;
|
||||
using llvm::errs;
|
||||
|
|
|
@ -27,8 +27,8 @@
|
|||
#include "mlir/IR/OperationSupport.h"
|
||||
#include "mlir/IR/StandardTypes.h"
|
||||
#include "mlir/StandardOps/Ops.h"
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/Support/STLExtras.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
|
||||
#include "llvm/ADT/STLExtras.h"
|
||||
#include "llvm/ADT/StringSwitch.h"
|
||||
|
|
|
@ -38,10 +38,10 @@
|
|||
#include "mlir/IR/Types.h"
|
||||
#include "mlir/Pass/Pass.h"
|
||||
#include "mlir/StandardOps/Ops.h"
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/Support/Functional.h"
|
||||
#include "mlir/Transforms/MLPatternLoweringPass.h"
|
||||
#include "mlir/Transforms/Passes.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
|
||||
/// Implements lowering of VectorTransferReadOp and VectorTransferWriteOp to a
|
||||
/// proper abstraction for the hardware.
|
||||
|
@ -58,9 +58,9 @@
|
|||
/// affine.for %i0 = 0 to %0 {
|
||||
/// affine.for %i1 = 0 to %1 step 256 {
|
||||
/// affine.for %i2 = 0 to %2 step 32 {
|
||||
/// %v = vector_transfer_read %A, %i0, %i1, %i2, %f0
|
||||
/// %v = vector.transfer_read %A[%i0, %i1, %i2], (%f0)
|
||||
/// {permutation_map: (d0, d1, d2) -> (d2, d1)} :
|
||||
/// (memref<?x?x?xf32>, index, index, f32) -> vector<32x256xf32>
|
||||
/// memref<?x?x?xf32>, vector<32x256xf32>
|
||||
/// }}}
|
||||
/// ```
|
||||
///
|
||||
|
|
|
@ -37,10 +37,10 @@
|
|||
#include "mlir/IR/Types.h"
|
||||
#include "mlir/Pass/Pass.h"
|
||||
#include "mlir/StandardOps/Ops.h"
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/Support/Functional.h"
|
||||
#include "mlir/Support/LLVM.h"
|
||||
#include "mlir/Transforms/Passes.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
|
||||
#include "llvm/Support/CommandLine.h"
|
||||
#include "llvm/Support/Debug.h"
|
||||
|
@ -54,7 +54,7 @@
|
|||
/// a target-independent way: the target vector size is specified as a parameter
|
||||
/// to the pass. This pass is thus a partial lowering that opens the "greybox"
|
||||
/// that is the super-vector abstraction. In particular, this pass can turn the
|
||||
/// vector_transfer_read and vector_transfer_write ops in either:
|
||||
/// vector.transfer_read and vector.transfer_write ops in either:
|
||||
/// 1. a loop nest with either scalar and vector load/store operations; or
|
||||
/// 2. a loop-nest with DmaStartOp / DmaWaitOp; or
|
||||
/// 3. a pre-existing blackbox library call that can be written manually or
|
||||
|
@ -79,14 +79,14 @@
|
|||
/// do not vectorize in the presence of conditionals for now, sliced chains are
|
||||
/// guaranteed not to escape the innermost scope, which has to be either the top
|
||||
/// Function scope or the innermost loop scope, by construction. As a
|
||||
/// consequence, the implementation just starts from vector_transfer_write
|
||||
/// consequence, the implementation just starts from vector.transfer_write
|
||||
/// operations and builds the slice scoped the innermost loop enclosing the
|
||||
/// current vector_transfer_write. These assumptions and the implementation
|
||||
/// current vector.transfer_write. These assumptions and the implementation
|
||||
/// details are subject to revision in the future.
|
||||
///
|
||||
/// Example
|
||||
/// ========
|
||||
/// In the following, the single vector_transfer_write op operates on a
|
||||
/// In the following, the single vector.transfer_write op operates on a
|
||||
/// vector<4x4x4xf32>. Let's assume the HW supports vector<4x4xf32>.
|
||||
/// Materialization is achieved by instantiating each occurrence of the leading
|
||||
/// dimension of vector<4x4x4xf32> into a vector<4x4xf32>.
|
||||
|
@ -98,16 +98,15 @@
|
|||
///
|
||||
/// ```mlir
|
||||
/// mlfunc @materialize(%M : index, %N : index, %O : index, %P : index) {
|
||||
/// %A = alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32, 0>
|
||||
/// %A = alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32>
|
||||
/// %f1 = constant splat<vector<4x4x4xf32>, 1.000000e+00> :
|
||||
/// vector<4x4x4xf32> affine.for %i0 = 0 to %M step 4 {
|
||||
/// affine.for %i1 = 0 to %N step 4 {
|
||||
/// affine.for %i2 = 0 to %O {
|
||||
/// affine.for %i3 = 0 to %P step 4 {
|
||||
/// vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3
|
||||
/// vector.transfer_write %f1, %A[%i0, %i1, %i2, %i3]
|
||||
/// {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} :
|
||||
/// vector<4x4x4xf32>, memref<?x?x?x?xf32, 0>,
|
||||
/// index, index, index, index
|
||||
/// vector<4x4x4xf32>, memref<?x?x?x?xf32>
|
||||
/// }}}}
|
||||
/// return
|
||||
/// }
|
||||
|
@ -123,30 +122,21 @@
|
|||
/// affine.for %i1 = 0 to %arg1 step 4 {
|
||||
/// affine.for %i2 = 0 to %arg2 {
|
||||
/// affine.for %i3 = 0 to %arg3 step 4 {
|
||||
/// %1 = affine.apply (d0, d1, d2, d3) -> (d0, d1, d2, d3)
|
||||
/// (%i0, %i1, %i2, %i3)
|
||||
/// vector_transfer_write f1, %0, %1#0, %1#1, %1#2, %1#3
|
||||
/// vector.transfer_write f1, %0[%i0, %i1, %i2, %i3]
|
||||
/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} :
|
||||
/// vector<4x4xf32>, memref<?x?x?x?xf32>,
|
||||
/// index, index, index, index
|
||||
/// %2 = affine.apply (d0, d1, d2, d3) -> (d0, d1, d2, d3 + 1)
|
||||
/// (%i0, %i1, %i2, %i3)
|
||||
/// vector_transfer_write {{.*}}, %0, %2#0, %2#1, %2#2, %2#3
|
||||
/// vector<4x4xf32>, memref<?x?x?x?xf32>
|
||||
/// %i3p1 = affine.apply (d0) -> (d0 + 1)(%i3)
|
||||
/// vector.transfer_write {{.*}}, %0[%i0, %i1, %i2, %i3p1]
|
||||
/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} :
|
||||
/// vector<4x4xf32>, memref<?x?x?x?xf32>,
|
||||
/// index, index, index, index
|
||||
/// %3 = affine.apply (d0, d1, d2, d3) -> (d0, d1, d2, d3 + 2)
|
||||
/// (%i0, %i1, %i2, %i3)
|
||||
/// vector_transfer_write {{.*}}, %0, %3#0, %3#1, %3#2, %3#3
|
||||
/// vector<4x4xf32>, memref<?x?x?x?xf32>
|
||||
/// %i3p2 = affine.apply (d0) -> (d0 + 2)(%i3)
|
||||
/// vector.transfer_write {{.*}}, %0[%i0, %i1, %i2, %i3p2]
|
||||
/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} :
|
||||
/// vector<4x4xf32>, memref<?x?x?x?xf32>,
|
||||
/// index, index, index, index
|
||||
/// %4 = affine.apply (d0, d1, d2, d3) -> (d0, d1, d2, d3 + 3)
|
||||
/// (%i0, %i1, %i2, %i3)
|
||||
/// vector_transfer_write {{.*}}, %0, %4#0, %4#1, %4#2, %4#3
|
||||
/// vector<4x4xf32>, memref<?x?x?x?xf32>
|
||||
/// %i3p3 = affine.apply (d0) -> (d0 + 3)(%i3)
|
||||
/// vector.transfer_write {{.*}}, %0[%i0, %i1, %i2, %i3p3]
|
||||
/// {permutation_map: (d0, d1, d2, d3) -> (d1, d0)} :
|
||||
/// vector<4x4xf32>, memref<?x?x?x?xf32>,
|
||||
/// index, index, index, index
|
||||
/// vector<4x4xf32>, memref<?x?x?x?xf32>
|
||||
/// }}}}
|
||||
/// return
|
||||
/// }
|
||||
|
@ -282,11 +272,11 @@ static Value *substitute(Value *v, VectorType hwVectorType,
|
|||
|
||||
/// Returns a list of single result AffineApplyOps that reindex the
|
||||
/// `memRefIndices` by the multi-dimensional `hwVectorInstance`. This is used by
|
||||
/// the function that materializes a vector_transfer operation to use hardware
|
||||
/// the function that materializes a vector.transfer operation to use hardware
|
||||
/// vector types instead of super-vector types.
|
||||
///
|
||||
/// The general problem this function solves is as follows:
|
||||
/// Assume a vector_transfer operation at the super-vector granularity that has
|
||||
/// Assume a vector.transfer operation at the super-vector granularity that has
|
||||
/// `l` enclosing loops (AffineForOp). Assume the vector transfer operation
|
||||
/// operates on a MemRef of rank `r`, a super-vector of rank `s` and a hardware
|
||||
/// vector of rank `h`. For the purpose of illustration assume l==4, r==3, s==2,
|
||||
|
@ -299,8 +289,8 @@ static Value *substitute(Value *v, VectorType hwVectorType,
|
|||
/// affine.for %i1 = 0 to %N step 3 {
|
||||
/// affine.for %i2 = 0 to %O {
|
||||
/// affine.for %i3 = 0 to %P step 32 {
|
||||
/// %r = vector_transfer_read(%A, map(%i..)#0, map(%i..)#1, map(%i..)#2)
|
||||
/// -> vector<3x32xf32>
|
||||
/// %r = vector.transfer_read(%A, map0(%i..), map1(%i..), map2(%i..)) :
|
||||
/// vector<3x32xf32>, memref<?x?x?xf32>
|
||||
/// ...
|
||||
/// }}}}
|
||||
/// ```
|
||||
|
@ -308,28 +298,31 @@ static Value *substitute(Value *v, VectorType hwVectorType,
|
|||
/// where map denotes an AffineMap operating on enclosing loops with properties
|
||||
/// compatible for vectorization (i.e. some contiguity left unspecified here).
|
||||
/// Note that the vectorized loops are %i1 and %i3.
|
||||
/// This function translates the vector_transfer_read operation to multiple
|
||||
/// instances of vector_transfer_read that operate on vector<8x32>.
|
||||
/// This function translates the vector.transfer_read operation to multiple
|
||||
/// instances of vector.transfer_read that operate on vector<8x32>.
|
||||
///
|
||||
/// Without loss of generality, we assume hwVectorInstance is: {2, 1}.
|
||||
/// The only constraints on hwVectorInstance is they belong to:
|
||||
/// [0, 2] x [0, 3], which is the span of ratio of super-vector shape to
|
||||
/// hardware vector shape in our example.
|
||||
///
|
||||
/// This function instantiates the iteration <2, 1> of vector_transfer_read
|
||||
/// This function instantiates the iteration <2, 1> of vector.transfer_read
|
||||
/// into the set of operations in pseudo-MLIR:
|
||||
///
|
||||
/// ```mlir
|
||||
/// map2 = (d0, d1, d2, d3) -> (d0, d1 + 2, d2, d3 + 1 * 8)
|
||||
/// map3 = map o map2 // where o denotes composition
|
||||
/// %r = vector_transfer_read(%A, map3(%i..)#0, map3(%i..)#1, map3(%i..)#2)
|
||||
/// -> vector<3x32xf32>
|
||||
/// #map2 = (d0, d1, d2, d3) -> (d0, d1 + 2, d2, d3 + 1 * 8)
|
||||
/// #map3 = #map o #map2 // where o denotes composition
|
||||
/// aff0 = affine.apply #map3.0(%i..)
|
||||
/// aff1 = affine.apply #map3.1(%i..)
|
||||
/// aff2 = affine.apply #map3.2(%i..)
|
||||
/// %r = vector.transfer_read(%A, %aff0, %aff1, %aff2):
|
||||
// vector<3x32xf32>, memref<?x?x?xf32>
|
||||
/// ```
|
||||
///
|
||||
/// Practical considerations
|
||||
/// ========================
|
||||
/// For now, `map` is assumed to be the identity map and the indices are
|
||||
/// specified just as vector_transfer_read(%A, %i0, %i1, %i2, %i3). This will be
|
||||
/// specified just as vector.transfer_read%A[%i0, %i1, %i2, %i3]. This will be
|
||||
/// extended in the future once we have a proper Op for vector transfers.
|
||||
/// Additionally, the example above is specified in pseudo-MLIR form; once we
|
||||
/// have proper support for generic maps we can generate the code and show
|
||||
|
@ -347,7 +340,7 @@ reindexAffineIndices(FuncBuilder *b, VectorType hwVectorType,
|
|||
|
||||
unsigned numIndices = memrefIndices.size();
|
||||
auto numMemRefIndices = numIndices - hwVectorInstance.size();
|
||||
auto numSuperVectorIndices = hwVectorInstance.size() - vectorShape.size();
|
||||
auto numVectorIndices = hwVectorInstance.size() - vectorShape.size();
|
||||
|
||||
SmallVector<AffineExpr, 8> affineExprs;
|
||||
// TODO(ntv): support a concrete map and composition.
|
||||
|
@ -358,11 +351,10 @@ reindexAffineIndices(FuncBuilder *b, VectorType hwVectorType,
|
|||
auto d_i = b->getAffineDimExpr(i);
|
||||
affineExprs.push_back(d_i);
|
||||
}
|
||||
// The next numSuperVectorIndices correspond to super-vector dimensions that
|
||||
// The next numVectorIndices correspond to super-vector dimensions that
|
||||
// do not have a hardware vector dimension counterpart. For those we only
|
||||
// need to increment the index by the corresponding hwVectorInstance.
|
||||
for (i = numMemRefIndices; i < numMemRefIndices + numSuperVectorIndices;
|
||||
++i) {
|
||||
for (i = numMemRefIndices; i < numMemRefIndices + numVectorIndices; ++i) {
|
||||
auto d_i = b->getAffineDimExpr(i);
|
||||
auto offset = hwVectorInstance[i - numMemRefIndices];
|
||||
affineExprs.push_back(d_i + offset);
|
||||
|
@ -374,7 +366,7 @@ reindexAffineIndices(FuncBuilder *b, VectorType hwVectorType,
|
|||
for (; i < numIndices; ++i) {
|
||||
auto d_i = b->getAffineDimExpr(i);
|
||||
auto offset = hwVectorInstance[i - numMemRefIndices];
|
||||
auto stride = vectorShape[i - numMemRefIndices - numSuperVectorIndices];
|
||||
auto stride = vectorShape[i - numMemRefIndices - numVectorIndices];
|
||||
affineExprs.push_back(d_i + offset * stride);
|
||||
}
|
||||
|
||||
|
@ -535,14 +527,14 @@ static Operation *instantiate(FuncBuilder *b, VectorTransferWriteOp write,
|
|||
/// The multi-dimensional `hwVectorInstance` belongs to the shapeRatio of
|
||||
/// super-vector type to hw vector type.
|
||||
/// A cloned instance of `op` is formed as follows:
|
||||
/// 1. vector_transfer_read: the return `superVectorType` is replaced by
|
||||
/// 1. vector.transfer_read: the return `superVectorType` is replaced by
|
||||
/// `hwVectorType`. Additionally, affine indices are reindexed with
|
||||
/// `reindexAffineIndices` using `hwVectorInstance` and vector type
|
||||
/// information;
|
||||
/// 2. vector_transfer_write: the `valueToStore` type is simply substituted.
|
||||
/// 2. vector.transfer_write: the `valueToStore` type is simply substituted.
|
||||
/// Since we operate on a topologically sorted slice, a substitution must
|
||||
/// have been registered for non-constant ops. Additionally, affine indices
|
||||
/// are reindexed in the same way as for vector_transfer_read;
|
||||
/// are reindexed in the same way as for vector.transfer_read;
|
||||
/// 3. constant ops are splats of the super-vector type by construction.
|
||||
/// They are cloned to a splat on the hw vector type with the same value;
|
||||
/// 4. remaining ops are cloned to version of the op that returns a hw vector
|
||||
|
@ -660,7 +652,7 @@ static bool emitSlice(MaterializationState *state,
|
|||
}
|
||||
|
||||
/// Materializes super-vector types into concrete hw vector types as follows:
|
||||
/// 1. start from super-vector terminators (current vector_transfer_write
|
||||
/// 1. start from super-vector terminators (current vector.transfer_write
|
||||
/// ops);
|
||||
/// 2. collect all the operations that can be reached by transitive use-defs
|
||||
/// chains;
|
||||
|
@ -755,13 +747,13 @@ void MaterializeVectorsPass::runOnFunction() {
|
|||
auto subVectorType =
|
||||
VectorType::get(hwVectorSize, FloatType::getF32(&getContext()));
|
||||
|
||||
// Capture terminators; i.e. vector_transfer_write ops involving a strict
|
||||
// Capture terminators; i.e. vector.transfer_write ops involving a strict
|
||||
// super-vector of subVectorType.
|
||||
auto filter = [subVectorType](Operation &op) {
|
||||
if (!op.isa<VectorTransferWriteOp>()) {
|
||||
return false;
|
||||
}
|
||||
return matcher::operatesOnSuperVectors(op, subVectorType);
|
||||
return matcher::operatesOnSuperVectorsOf(op, subVectorType);
|
||||
};
|
||||
auto pat = Op(filter);
|
||||
SmallVector<NestedMatch, 8> matches;
|
||||
|
|
|
@ -108,7 +108,7 @@ void VectorizerTestPass::testVectorShapeRatio(llvm::raw_ostream &outs) {
|
|||
assert(subVectorType.getElementType() ==
|
||||
FloatType::getF32(subVectorType.getContext()) &&
|
||||
"Only f32 supported for now");
|
||||
if (!matcher::operatesOnSuperVectors(op, subVectorType)) {
|
||||
if (!matcher::operatesOnSuperVectorsOf(op, subVectorType)) {
|
||||
return false;
|
||||
}
|
||||
if (op.getNumResults() != 1) {
|
||||
|
|
|
@ -32,10 +32,10 @@
|
|||
#include "mlir/IR/Types.h"
|
||||
#include "mlir/Pass/Pass.h"
|
||||
#include "mlir/StandardOps/Ops.h"
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/Support/Functional.h"
|
||||
#include "mlir/Support/LLVM.h"
|
||||
#include "mlir/Transforms/Passes.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
|
||||
#include "llvm/ADT/DenseMap.h"
|
||||
#include "llvm/ADT/DenseSet.h"
|
||||
|
@ -69,7 +69,7 @@ using namespace mlir;
|
|||
/// Some may prefer the terminology a "tile of HW vectors". In this case, one
|
||||
/// should note that super-vectors implement an "always full tile" abstraction.
|
||||
/// They guarantee no partial-tile separation is necessary by relying on a
|
||||
/// high-level copy-reshape abstraction that we call vector_transfer. This
|
||||
/// high-level copy-reshape abstraction that we call vector.transfer. This
|
||||
/// copy-reshape operations is also responsible for performing layout
|
||||
/// transposition if necessary. In the general case this will require a scoped
|
||||
/// allocation in some notional local memory.
|
||||
|
@ -115,19 +115,17 @@ using namespace mlir;
|
|||
/// At a high level, a vectorized load in a loop will resemble:
|
||||
/// ```mlir
|
||||
/// affine.for %i = ? to ? step ? {
|
||||
/// %v_a = "vector_transfer_read" (A, %i) : (memref<?xf32>, index) ->
|
||||
/// vector<128xf32>
|
||||
/// %v_a = vector.transfer_read A[%i] : memref<?xf32>, vector<128xf32>
|
||||
/// }
|
||||
/// ```
|
||||
/// It is the reponsibility of the implementation of the vector_transfer_read
|
||||
/// to materialize vector registers from the original scalar memrefs.
|
||||
/// A later (more target-dependent) lowering pass will materialize to actual HW
|
||||
/// vector sizes. This lowering may be occur at different times:
|
||||
/// It is the responsibility of the implementation of vector.transfer_read to
|
||||
/// materialize vector registers from the original scalar memrefs. A later (more
|
||||
/// target-dependent) lowering pass will materialize to actual HW vector sizes.
|
||||
/// This lowering may be occur at different times:
|
||||
/// 1. at the MLIR level into a combination of loops, unrolling, DmaStartOp +
|
||||
/// DmaWaitOp + vectorized operations
|
||||
/// for data transformations and shuffle; thus opening opportunities for
|
||||
/// unrolling and pipelining. This is an instance of library call
|
||||
/// "whiteboxing"; or
|
||||
/// DmaWaitOp + vectorized operations for data transformations and shuffle;
|
||||
/// thus opening opportunities for unrolling and pipelining. This is an
|
||||
/// instance of library call "whiteboxing"; or
|
||||
/// 2. later in the a target-specific lowering pass or hand-written library
|
||||
/// call; achieving full separation of concerns. This is an instance of
|
||||
/// library call; or
|
||||
|
@ -225,7 +223,7 @@ using namespace mlir;
|
|||
/// 3. right after polyhedral-style scheduling: PLUTO-style algorithms are known
|
||||
/// to improve locality, parallelism and be configurable (e.g. max-fuse,
|
||||
/// smart-fuse etc). They can also have adverse effects on contiguity
|
||||
/// properties that are required for vectorization but the vector_transfer
|
||||
/// properties that are required for vectorization but the vector.transfer
|
||||
/// copy-reshape-pad-transpose abstraction is expected to help recapture
|
||||
/// these properties.
|
||||
/// 4. right after polyhedral-style scheduling+tiling;
|
||||
|
@ -265,7 +263,7 @@ using namespace mlir;
|
|||
/// 3. Then, for each pattern in order:
|
||||
/// a. applying iterative rewriting of the loop and the load operations in
|
||||
/// DFS postorder. Rewriting is implemented by coarsening the loops and
|
||||
/// turning load operations into opaque vector_transfer_read ops;
|
||||
/// turning load operations into opaque vector.transfer_read ops;
|
||||
/// b. keeping track of the load operations encountered as "roots" and the
|
||||
/// store operations as "terminals";
|
||||
/// c. traversing the use-def chains starting from the roots and iteratively
|
||||
|
@ -304,7 +302,7 @@ using namespace mlir;
|
|||
/// %s5 = addf %a5, %b5 : f32`
|
||||
///
|
||||
/// Lastly, we show a minimal example for which use-def chains rooted in load /
|
||||
/// vector_transfer_read are not enough. This is what motivated splitting
|
||||
/// vector.transfer_read are not enough. This is what motivated splitting
|
||||
/// terminal processing out of the use-def chains starting from loads. In the
|
||||
/// following snippet, there is simply no load::
|
||||
/// ```mlir
|
||||
|
@ -343,8 +341,7 @@ using namespace mlir;
|
|||
/// scheduling, so we want to generate a pattern that resembles:
|
||||
/// ```mlir
|
||||
/// affine.for %i = ? to ? step ? {
|
||||
/// %v_a = "vector_transfer_read" (A, %i) : (memref<?xf32>, index) ->
|
||||
/// vector<128xf32>
|
||||
/// %v_a = vector.transfer_read A[%i] : memref<?xf32>, vector<128xf32>
|
||||
/// }
|
||||
/// ```
|
||||
///
|
||||
|
@ -364,8 +361,7 @@ using namespace mlir;
|
|||
/// abstraction of size 128 returns code similar to:
|
||||
/// ```mlir
|
||||
/// affine.for %i = %M to %N step 128 {
|
||||
/// %v_a = "vector_transfer_read" (A, %i) : (memref<?xf32>, index) ->
|
||||
/// vector<128xf32>
|
||||
/// %v_a = vector.transfer_read A[%i] : memref<?xf32>, vector<128xf32>
|
||||
/// }
|
||||
/// ```
|
||||
///
|
||||
|
@ -373,9 +369,9 @@ using namespace mlir;
|
|||
/// ========================================================================
|
||||
/// 1. lowering to concrete vector types for various HW;
|
||||
/// 2. reduction support;
|
||||
/// 3. non-effecting padding during vector_transfer_read and filter during
|
||||
/// vector_transfer_write;
|
||||
/// 4. misalignment support vector_transfer_read / vector_transfer_write
|
||||
/// 3. non-effecting padding during vector.transfer_read and filter during
|
||||
/// vector.transfer_write;
|
||||
/// 4. misalignment support vector.transfer_read / vector.transfer_write
|
||||
/// (hopefully without read-modify-writes);
|
||||
/// 5. control-flow support;
|
||||
/// 6. cost-models, heuristics and search;
|
||||
|
@ -443,24 +439,24 @@ using namespace mlir;
|
|||
/// affine.for %i1 = 0 to %arg1 step 256 {
|
||||
/// %cst_1 = constant splat<vector<256xf32>, 1.0> :
|
||||
/// vector<256xf32>
|
||||
/// "vector_transfer_write"(%cst_1, %0, %i0, %i1) :
|
||||
/// (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
|
||||
/// vector.transfer_write %cst_1, %0[%i0, %i1] :
|
||||
/// vector<256xf32>, memref<?x?xf32>
|
||||
/// }
|
||||
/// }
|
||||
/// affine.for %i2 = 0 to %arg0 {
|
||||
/// affine.for %i3 = 0 to %arg1 step 256 {
|
||||
/// %cst_2 = constant splat<vector<256xf32>, 2.0> :
|
||||
/// vector<256xf32>
|
||||
/// "vector_transfer_write"(%cst_2, %1, %i2, %i3) :
|
||||
/// (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
|
||||
/// vector.transfer_write %cst_2, %1[%i2, %i3] :
|
||||
/// vector<256xf32>, memref<?x?xf32>
|
||||
/// }
|
||||
/// }
|
||||
/// affine.for %i4 = 0 to %arg0 {
|
||||
/// affine.for %i5 = 0 to %arg1 step 256 {
|
||||
/// %3 = "vector_transfer_read"(%0, %i4, %i5) :
|
||||
/// (memref<?x?xf32>, index, index) -> vector<256xf32>
|
||||
/// %4 = "vector_transfer_read"(%1, %i4, %i5) :
|
||||
/// (memref<?x?xf32>, index, index) -> vector<256xf32>
|
||||
/// %3 = vector.transfer_read %0[%i4, %i5] :
|
||||
/// memref<?x?xf32>, vector<256xf32>
|
||||
/// %4 = vector.transfer_read %1[%i4, %i5] :
|
||||
/// memref<?x?xf32>, vector<256xf32>
|
||||
/// %5 = addf %3, %4 : vector<256xf32>
|
||||
/// %cst_3 = constant splat<vector<256xf32>, 1.0> :
|
||||
/// vector<256xf32>
|
||||
|
@ -469,8 +465,8 @@ using namespace mlir;
|
|||
/// vector<256xf32>
|
||||
/// %7 = addf %5, %cst_4 : vector<256xf32>
|
||||
/// %8 = addf %7, %6 : vector<256xf32>
|
||||
/// "vector_transfer_write"(%8, %2, %i4, %i5) :
|
||||
/// (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
|
||||
/// vector.transfer_write %8, %2[%i4, %i5] :
|
||||
/// vector<256xf32>, memref<?x?xf32>
|
||||
/// }
|
||||
/// }
|
||||
/// %c7 = constant 7 : index
|
||||
|
@ -499,24 +495,24 @@ using namespace mlir;
|
|||
/// affine.for %i1 = 0 to %arg1 step 256 {
|
||||
/// %cst_1 = constant splat<vector<32x256xf32>, 1.0> :
|
||||
/// vector<32x256xf32>
|
||||
/// "vector_transfer_write"(%cst_1, %0, %i0, %i1) :
|
||||
/// (vector<32x256xf32>, memref<?x?xf32>, index, index) -> ()
|
||||
/// vector.transfer_write %cst_1, %0[%i0, %i1] :
|
||||
/// vector<32x256xf32>, memref<?x?xf32>
|
||||
/// }
|
||||
/// }
|
||||
/// affine.for %i2 = 0 to %arg0 step 32 {
|
||||
/// affine.for %i3 = 0 to %arg1 step 256 {
|
||||
/// %cst_2 = constant splat<vector<32x256xf32>, 2.0> :
|
||||
/// vector<32x256xf32>
|
||||
/// "vector_transfer_write"(%cst_2, %1, %i2, %i3) :
|
||||
/// (vector<32x256xf32>, memref<?x?xf32>, index, index) -> ()
|
||||
/// vector.transfer_write %cst_2, %1[%i2, %i3] :
|
||||
/// vector<32x256xf32>, memref<?x?xf32>
|
||||
/// }
|
||||
/// }
|
||||
/// affine.for %i4 = 0 to %arg0 step 32 {
|
||||
/// affine.for %i5 = 0 to %arg1 step 256 {
|
||||
/// %3 = "vector_transfer_read"(%0, %i4, %i5) :
|
||||
/// (memref<?x?xf32>, index, index) -> vector<32x256xf32>
|
||||
/// %4 = "vector_transfer_read"(%1, %i4, %i5) :
|
||||
/// (memref<?x?xf32>, index, index) -> vector<32x256xf32>
|
||||
/// %3 = vector.transfer_read %0[%i4, %i5] :
|
||||
/// memref<?x?xf32> vector<32x256xf32>
|
||||
/// %4 = vector.transfer_read %1[%i4, %i5] :
|
||||
/// memref<?x?xf32>, vector<32x256xf32>
|
||||
/// %5 = addf %3, %4 : vector<32x256xf32>
|
||||
/// %cst_3 = constant splat<vector<32x256xf32>, 1.0> :
|
||||
/// vector<32x256xf32>
|
||||
|
@ -525,8 +521,8 @@ using namespace mlir;
|
|||
/// vector<32x256xf32>
|
||||
/// %7 = addf %5, %cst_4 : vector<32x256xf32>
|
||||
/// %8 = addf %7, %6 : vector<32x256xf32>
|
||||
/// "vector_transfer_write"(%8, %2, %i4, %i5) :
|
||||
/// (vector<32x256xf32>, memref<?x?xf32>, index, index) -> ()
|
||||
/// vector.transfer_write %8, %2[%i4, %i5] :
|
||||
/// vector<32x256xf32>, memref<?x?xf32>
|
||||
/// }
|
||||
/// }
|
||||
/// %c7 = constant 7 : index
|
||||
|
@ -788,7 +784,7 @@ void VectorizationState::registerReplacement(Value *key, Value *value) {
|
|||
/// Handles the vectorization of load and store MLIR operations.
|
||||
///
|
||||
/// LoadOp operations are the roots of the vectorizeNonTerminals call. They are
|
||||
/// vectorized immediately. The resulting vector_transfer_read is immediately
|
||||
/// vectorized immediately. The resulting vector.transfer_read is immediately
|
||||
/// registered to replace all uses of the LoadOp in this pattern's scope.
|
||||
///
|
||||
/// StoreOp are the terminals of the vectorizeNonTerminals call. They need to be
|
||||
|
@ -811,9 +807,9 @@ static LogicalResult vectorizeRootOrTerminal(Value *iv,
|
|||
|
||||
// Materialize a MemRef with 1 vector.
|
||||
auto *opInst = memoryOp.getOperation();
|
||||
// For now, vector_transfers must be aligned, operate only on indices with an
|
||||
// For now, vector.transfers must be aligned, operate only on indices with an
|
||||
// identity subset of AffineMap and do not change layout.
|
||||
// TODO(ntv): increase the expressiveness power of vector_transfer operations
|
||||
// TODO(ntv): increase the expressiveness power of vector.transfer operations
|
||||
// as needed by various targets.
|
||||
if (opInst->template isa<LoadOp>()) {
|
||||
auto permutationMap =
|
||||
|
@ -835,7 +831,7 @@ static LogicalResult vectorizeRootOrTerminal(Value *iv,
|
|||
/// end TODO(ntv): Hoist to a VectorizationMaterialize.cpp when appropriate. ///
|
||||
|
||||
/// Coarsens the loops bounds and transforms all remaining load and store
|
||||
/// operations into the appropriate vector_transfer.
|
||||
/// operations into the appropriate vector.transfer.
|
||||
static LogicalResult vectorizeAffineForOp(AffineForOp loop, int64_t step,
|
||||
VectorizationState *state) {
|
||||
using namespace functional;
|
||||
|
@ -1023,9 +1019,9 @@ static Operation *vectorizeOneOperation(Operation *opInst,
|
|||
assert(!opInst->isa<LoadOp>() &&
|
||||
"all loads must have already been fully vectorized independently");
|
||||
assert(!opInst->isa<VectorTransferReadOp>() &&
|
||||
"vector_transfer_read cannot be further vectorized");
|
||||
"vector.transfer_read cannot be further vectorized");
|
||||
assert(!opInst->isa<VectorTransferWriteOp>() &&
|
||||
"vector_transfer_write cannot be further vectorized");
|
||||
"vector.transfer_write cannot be further vectorized");
|
||||
|
||||
if (auto store = opInst->dyn_cast<StoreOp>()) {
|
||||
auto *memRef = store.getMemRef();
|
||||
|
|
|
@ -15,8 +15,8 @@
|
|||
// limitations under the License.
|
||||
// =============================================================================
|
||||
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
using namespace mlir;
|
||||
|
||||
// Static initialization for SuperVectorOps dialect registration.
|
||||
static DialectRegistration<SuperVectorOpsDialect> SuperVectorOps;
|
||||
// Static initialization for VectorOps dialect registration.
|
||||
static DialectRegistration<VectorOpsDialect> VectorOps;
|
|
@ -1,4 +1,4 @@
|
|||
//===- SuperVectorOps.cpp - MLIR Super Vectorizer Operations---------------===//
|
||||
//===- VectorOps.cpp - MLIR Super Vectorizer Operations -------------------===//
|
||||
//
|
||||
// Copyright 2019 The MLIR Authors.
|
||||
//
|
||||
|
@ -20,7 +20,7 @@
|
|||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#include "mlir/SuperVectorOps/SuperVectorOps.h"
|
||||
#include "mlir/VectorOps/VectorOps.h"
|
||||
#include "mlir/IR/AffineExpr.h"
|
||||
#include "mlir/IR/AffineMap.h"
|
||||
#include "mlir/IR/Builders.h"
|
||||
|
@ -29,11 +29,11 @@
|
|||
using namespace mlir;
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
// SuperVectorOpsDialect
|
||||
// VectorOpsDialect
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
SuperVectorOpsDialect::SuperVectorOpsDialect(MLIRContext *context)
|
||||
: Dialect(/*namePrefix=*/"", context) {
|
||||
VectorOpsDialect::VectorOpsDialect(MLIRContext *context)
|
||||
: Dialect("vector", context) {
|
||||
addOperations<VectorTransferReadOp, VectorTransferWriteOp,
|
||||
VectorTypeCastOp>();
|
||||
}
|
||||
|
@ -107,88 +107,65 @@ AffineMap VectorTransferReadOp::getPermutationMap() {
|
|||
void VectorTransferReadOp::print(OpAsmPrinter *p) {
|
||||
*p << getOperationName() << " ";
|
||||
p->printOperand(getMemRef());
|
||||
*p << ", ";
|
||||
*p << "[";
|
||||
p->printOperands(getIndices());
|
||||
*p << "]";
|
||||
auto optionalPaddingValue = getPaddingValue();
|
||||
if (optionalPaddingValue) {
|
||||
*p << ", ";
|
||||
*p << ", (";
|
||||
p->printOperand(*optionalPaddingValue);
|
||||
*p << ")";
|
||||
}
|
||||
p->printOptionalAttrDict(getAttrs());
|
||||
// Construct the FunctionType and print it.
|
||||
llvm::SmallVector<Type, 8> inputs{getMemRefType()};
|
||||
// Must have at least one actual index, see verify.
|
||||
Value *firstIndex = *getIndices().begin();
|
||||
Type indexType = firstIndex->getType();
|
||||
inputs.append(getMemRefType().getRank(), indexType);
|
||||
if (optionalPaddingValue) {
|
||||
inputs.push_back((*optionalPaddingValue)->getType());
|
||||
}
|
||||
*p << " : "
|
||||
<< FunctionType::get(inputs, {getResultType()}, indexType.getContext());
|
||||
*p << " : " << getMemRefType();
|
||||
*p << ", " << getResultType();
|
||||
}
|
||||
|
||||
bool VectorTransferReadOp::parse(OpAsmParser *parser, OperationState *result) {
|
||||
SmallVector<OpAsmParser::OperandType, 8> parsedOperands;
|
||||
Type type;
|
||||
OpAsmParser::OperandType memrefInfo;
|
||||
SmallVector<OpAsmParser::OperandType, 8> indexInfo;
|
||||
SmallVector<OpAsmParser::OperandType, 8> paddingInfo;
|
||||
SmallVector<Type, 2> types;
|
||||
|
||||
// Parsing with support for optional paddingValue.
|
||||
auto fail = parser->parseOperandList(parsedOperands) ||
|
||||
parser->parseOptionalAttributeDict(result->attributes) ||
|
||||
parser->parseColonType(type);
|
||||
if (fail) {
|
||||
if (parser->parseOperand(memrefInfo) ||
|
||||
parser->parseOperandList(indexInfo, -1, OpAsmParser::Delimiter::Square) ||
|
||||
parser->parseTrailingOperandList(paddingInfo, -1,
|
||||
OpAsmParser::Delimiter::Paren) ||
|
||||
parser->parseOptionalAttributeDict(result->attributes) ||
|
||||
parser->parseColonTypeList(types))
|
||||
return true;
|
||||
}
|
||||
|
||||
// Resolution.
|
||||
auto funType = type.dyn_cast<FunctionType>();
|
||||
if (!funType)
|
||||
return parser->emitError(parser->getNameLoc(), "Function type expected");
|
||||
if (funType.getNumInputs() < 1)
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"Function type expects at least one input");
|
||||
MemRefType memrefType =
|
||||
funType.getInput(Offsets::MemRefOffset).dyn_cast<MemRefType>();
|
||||
if (types.size() != 2)
|
||||
return parser->emitError(parser->getNameLoc(), "expected 2 types");
|
||||
MemRefType memrefType = types[0].dyn_cast<MemRefType>();
|
||||
if (!memrefType)
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"MemRef type expected for first input");
|
||||
if (funType.getNumResults() < 1)
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"Function type expects exactly one vector result");
|
||||
VectorType vectorType = funType.getResult(0).dyn_cast<VectorType>();
|
||||
return parser->emitError(parser->getNameLoc(), "memRef type expected");
|
||||
VectorType vectorType = types[1].dyn_cast<VectorType>();
|
||||
if (!vectorType)
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"Vector type expected for first result");
|
||||
if (parsedOperands.size() != funType.getNumInputs())
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"requires " + Twine(funType.getNumInputs()) +
|
||||
" operands");
|
||||
return parser->emitError(parser->getNameLoc(), "vector type expected");
|
||||
|
||||
// Extract optional paddingValue.
|
||||
OpAsmParser::OperandType memrefInfo = parsedOperands[0];
|
||||
// At this point, indexInfo may contain the optional paddingValue, pop it out.
|
||||
SmallVector<OpAsmParser::OperandType, 8> indexInfo{
|
||||
parsedOperands.begin() + Offsets::FirstIndexOffset, parsedOperands.end()};
|
||||
if (indexInfo.size() != memrefType.getRank())
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"expected " + Twine(memrefType.getRank()) +
|
||||
" indices to the memref");
|
||||
if (paddingInfo.size() > 1)
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"expected at most one padding value");
|
||||
Type paddingType;
|
||||
OpAsmParser::OperandType paddingValue;
|
||||
bool hasPaddingValue = indexInfo.size() > memrefType.getRank();
|
||||
unsigned expectedNumOperands = Offsets::FirstIndexOffset +
|
||||
memrefType.getRank() +
|
||||
(hasPaddingValue ? 1 : 0);
|
||||
if (hasPaddingValue) {
|
||||
paddingType = funType.getInputs().back();
|
||||
paddingValue = indexInfo.pop_back_val();
|
||||
bool hasOptionalPaddingValue = !paddingInfo.empty();
|
||||
if (hasOptionalPaddingValue) {
|
||||
paddingType = vectorType.getElementType();
|
||||
}
|
||||
if (funType.getNumInputs() != expectedNumOperands)
|
||||
return parser->emitError(
|
||||
parser->getNameLoc(),
|
||||
"requires actual number of operands to match function type");
|
||||
|
||||
auto indexType = parser->getBuilder().getIndexType();
|
||||
return parser->resolveOperand(memrefInfo, memrefType, result->operands) ||
|
||||
parser->resolveOperands(indexInfo, indexType, result->operands) ||
|
||||
(hasPaddingValue && parser->resolveOperand(paddingValue, paddingType,
|
||||
result->operands)) ||
|
||||
(hasOptionalPaddingValue &&
|
||||
parser->resolveOperand(paddingInfo[0], paddingType,
|
||||
result->operands)) ||
|
||||
parser->addTypeToList(vectorType, result->types);
|
||||
}
|
||||
|
||||
|
@ -204,7 +181,7 @@ bool VectorTransferReadOp::verify() {
|
|||
// Consistency of vector type in function type.
|
||||
if (!getResult()->getType().isa<VectorType>()) {
|
||||
return emitOpError("should have a vector result type in function type: "
|
||||
"(memref_type [, elemental_type]) -> vector_type");
|
||||
"memref_type<...xelemental_type>, vector_type");
|
||||
}
|
||||
// Consistency of elemental types in memref and vector.
|
||||
MemRefType memrefType = getMemRefType();
|
||||
|
@ -219,8 +196,9 @@ bool VectorTransferReadOp::verify() {
|
|||
(optionalPaddingValue ? 1 : 0);
|
||||
// Checks on the actual operands and their types.
|
||||
if (getNumOperands() != expectedNumOperands) {
|
||||
return emitOpError("expects " + Twine(expectedNumOperands) +
|
||||
" operands to match the types");
|
||||
return emitOpError("expects " + Twine(expectedNumOperands) + " operands " +
|
||||
"(of which " + Twine(memrefType.getRank()) +
|
||||
" indices)");
|
||||
}
|
||||
// Consistency of padding value with vector type.
|
||||
if (optionalPaddingValue) {
|
||||
|
@ -239,7 +217,7 @@ bool VectorTransferReadOp::verify() {
|
|||
for (auto *idx : getIndices()) {
|
||||
if (!idx->getType().isIndex()) {
|
||||
return emitOpError(
|
||||
"index to vector_transfer_read must have 'index' type");
|
||||
"index to vector.transfer_read must have 'index' type");
|
||||
}
|
||||
++numIndices;
|
||||
}
|
||||
|
@ -301,63 +279,41 @@ void VectorTransferWriteOp::print(OpAsmPrinter *p) {
|
|||
*p << getOperationName();
|
||||
*p << " " << *getVector();
|
||||
*p << ", " << *getMemRef();
|
||||
*p << ", ";
|
||||
*p << "[";
|
||||
p->printOperands(getIndices());
|
||||
*p << "]";
|
||||
p->printOptionalAttrDict(getAttrs());
|
||||
Type indexType = (*getIndices().begin())->getType();
|
||||
*p << " : ";
|
||||
p->printType(getVectorType());
|
||||
*p << ", ";
|
||||
p->printType(getMemRefType());
|
||||
for (unsigned r = 0, n = getMemRefType().getRank(); r < n; ++r) {
|
||||
*p << ", ";
|
||||
p->printType(indexType);
|
||||
}
|
||||
}
|
||||
|
||||
bool VectorTransferWriteOp::parse(OpAsmParser *parser, OperationState *result) {
|
||||
SmallVector<OpAsmParser::OperandType, 8> parsedOperands;
|
||||
SmallVector<Type, 8> types;
|
||||
|
||||
// Parsing with support for optional paddingValue.
|
||||
auto fail = parser->parseOperandList(parsedOperands) ||
|
||||
parser->parseOptionalAttributeDict(result->attributes) ||
|
||||
parser->parseColonTypeList(types);
|
||||
if (fail) {
|
||||
OpAsmParser::OperandType storeValueInfo;
|
||||
OpAsmParser::OperandType memrefInfo;
|
||||
SmallVector<OpAsmParser::OperandType, 4> indexInfo;
|
||||
SmallVector<Type, 2> types;
|
||||
auto indexType = parser->getBuilder().getIndexType();
|
||||
if (parser->parseOperand(storeValueInfo) || parser->parseComma() ||
|
||||
parser->parseOperand(memrefInfo) ||
|
||||
parser->parseOperandList(indexInfo, -1, OpAsmParser::Delimiter::Square) ||
|
||||
parser->parseOptionalAttributeDict(result->attributes) ||
|
||||
parser->parseColonTypeList(types))
|
||||
return true;
|
||||
}
|
||||
|
||||
// Resolution.
|
||||
if (parsedOperands.size() != types.size())
|
||||
return parser->emitError(
|
||||
parser->getNameLoc(),
|
||||
"requires number of operands and input types to match");
|
||||
if (parsedOperands.size() < Offsets::FirstIndexOffset)
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"requires at least vector and memref operands");
|
||||
if (types.size() != 2)
|
||||
return parser->emitError(parser->getNameLoc(), "expected 2 types");
|
||||
VectorType vectorType = types[Offsets::VectorOffset].dyn_cast<VectorType>();
|
||||
if (!vectorType)
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"Vector type expected for first input type");
|
||||
return parser->emitError(parser->getNameLoc(), "vector type expected");
|
||||
MemRefType memrefType = types[Offsets::MemRefOffset].dyn_cast<MemRefType>();
|
||||
if (!memrefType)
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"MemRef type expected for second input type");
|
||||
return parser->emitError(parser->getNameLoc(), "memRef type expected");
|
||||
|
||||
unsigned expectedNumOperands =
|
||||
Offsets::FirstIndexOffset + memrefType.getRank();
|
||||
if (parsedOperands.size() != expectedNumOperands)
|
||||
return parser->emitError(parser->getNameLoc(),
|
||||
"requires " + Twine(expectedNumOperands) +
|
||||
" operands");
|
||||
|
||||
OpAsmParser::OperandType vectorInfo = parsedOperands[Offsets::VectorOffset];
|
||||
OpAsmParser::OperandType memrefInfo = parsedOperands[Offsets::MemRefOffset];
|
||||
SmallVector<OpAsmParser::OperandType, 8> indexInfo{
|
||||
parsedOperands.begin() + Offsets::FirstIndexOffset, parsedOperands.end()};
|
||||
auto indexType = parser->getBuilder().getIndexType();
|
||||
return parser->resolveOperand(vectorInfo, vectorType, result->operands) ||
|
||||
parser->resolveOperand(memrefInfo, memrefType, result->operands) ||
|
||||
return parser->resolveOperands(storeValueInfo, vectorType,
|
||||
result->operands) ||
|
||||
parser->resolveOperands(memrefInfo, memrefType, result->operands) ||
|
||||
parser->resolveOperands(indexInfo, indexType, result->operands);
|
||||
}
|
||||
|
||||
|
@ -386,15 +342,16 @@ bool VectorTransferWriteOp::verify() {
|
|||
Offsets::FirstIndexOffset + memrefType.getRank();
|
||||
// Checks on the actual operands and their types.
|
||||
if (getNumOperands() != expectedNumOperands) {
|
||||
return emitOpError("expects " + Twine(expectedNumOperands) +
|
||||
" operands to match the types");
|
||||
return emitOpError("expects " + Twine(expectedNumOperands) + " operands " +
|
||||
"(of which " + Twine(memrefType.getRank()) +
|
||||
" indices)");
|
||||
}
|
||||
// Consistency of indices types.
|
||||
unsigned numIndices = 0;
|
||||
for (auto *idx : getIndices()) {
|
||||
if (!idx->getType().isIndex()) {
|
||||
return emitOpError(
|
||||
"index to vector_transfer_write must have 'index' type");
|
||||
"index to vector.transfer_write must have 'index' type");
|
||||
}
|
||||
numIndices++;
|
||||
}
|
|
@ -565,10 +565,10 @@ TEST_FUNC(vectorize_2d) {
|
|||
// CHECK-NEXT: affine.for %i0 = 0 to (d0) -> (d0)(%[[M]]) {
|
||||
// CHECK-NEXT: affine.for %i1 = 0 to (d0) -> (d0)(%[[N]]) step 4 {
|
||||
// CHECK-NEXT: affine.for %i2 = 0 to (d0) -> (d0)(%[[P]]) step 4 {
|
||||
// CHECK-NEXT: %[[vA:.*]] = "vector_transfer_read"(%arg1, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (memref<?x?x?xf32>, index, index, index) -> vector<4x4xf32>
|
||||
// CHECK-NEXT: %[[vB:.*]] = "vector_transfer_read"(%arg0, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (memref<?x?x?xf32>, index, index, index) -> vector<4x4xf32>
|
||||
// CHECK-NEXT: %[[vA:.*]] = "vector.transfer_read"(%arg1, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (memref<?x?x?xf32>, index, index, index) -> vector<4x4xf32>
|
||||
// CHECK-NEXT: %[[vB:.*]] = "vector.transfer_read"(%arg0, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (memref<?x?x?xf32>, index, index, index) -> vector<4x4xf32>
|
||||
// CHECK-NEXT: %[[vRES:.*]] = addf %[[vB]], %[[vA]] : vector<4x4xf32>
|
||||
// CHECK-NEXT: "vector_transfer_write"(%[[vRES:.*]], %arg2, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (vector<4x4xf32>, memref<?x?x?xf32>, index, index, index) -> ()
|
||||
// CHECK-NEXT: "vector.transfer_write"(%[[vRES:.*]], %arg2, %i0, %i1, %i2) {permutation_map: (d0, d1, d2) -> (d1, d2)} : (vector<4x4xf32>, memref<?x?x?xf32>, index, index, index) -> ()
|
||||
// clang-format on
|
||||
|
||||
mlir::PassManager pm;
|
||||
|
|
|
@ -330,22 +330,22 @@ func @test_dimop(%arg0: tensor<4x4x?xf32>) {
|
|||
}
|
||||
|
||||
|
||||
// CHECK-LABEL: func @test_vector_transfer_ops(%arg0
|
||||
func @test_vector_transfer_ops(%arg0: memref<?x?xf32>) {
|
||||
// CHECK-LABEL: func @test_vector.transfer_ops(%arg0
|
||||
func @test_vector.transfer_ops(%arg0: memref<?x?xf32>) {
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// CHECK: %0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: #[[map_proj_d0d1_d0]]} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
// CHECK: %1 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: #[[map_proj_d0d1_d1d0]]} : (memref<?x?xf32>, index, index) -> vector<3x7xf32>
|
||||
%1 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d1, d0)} : (memref<?x?xf32>, index, index) -> vector<3x7xf32>
|
||||
// CHECK: %2 = vector_transfer_read %arg0, %c3, %c3, %cst {permutation_map: #[[map_proj_d0d1_d0]]} : (memref<?x?xf32>, index, index, f32) -> vector<128xf32>
|
||||
%2 = vector_transfer_read %arg0, %c3, %c3, %cst {permutation_map: (d0, d1)->(d0)} : (memref<?x?xf32>, index, index, f32) -> vector<128xf32>
|
||||
// CHECK: %3 = vector_transfer_read %arg0, %c3, %c3, %cst {permutation_map: #[[map_proj_d0d1_d1]]} : (memref<?x?xf32>, index, index, f32) -> vector<128xf32>
|
||||
%3 = vector_transfer_read %arg0, %c3, %c3, %cst {permutation_map: (d0, d1)->(d1)} : (memref<?x?xf32>, index, index, f32) -> vector<128xf32>
|
||||
// CHECK: %0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d0]]} : memref<?x?xf32>, vector<128xf32>
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : memref<?x?xf32>, vector<128xf32>
|
||||
// CHECK: %1 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d1d0]]} : memref<?x?xf32>, vector<3x7xf32>
|
||||
%1 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d1, d0)} : memref<?x?xf32>, vector<3x7xf32>
|
||||
// CHECK: %2 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: #[[map_proj_d0d1_d0]]} : memref<?x?xf32>, vector<128xf32>
|
||||
%2 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: (d0, d1)->(d0)} : memref<?x?xf32>, vector<128xf32>
|
||||
// CHECK: %3 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
|
||||
%3 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: (d0, d1)->(d1)} : memref<?x?xf32>, vector<128xf32>
|
||||
//
|
||||
// CHECK: vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: #[[map_proj_d0d1_d0]]} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
// CHECK: vector_transfer_write %1, %arg0, %c3, %c3 {permutation_map: #[[map_proj_d0d1_d1d0]]} : vector<3x7xf32>, memref<?x?xf32>, index, index
|
||||
vector_transfer_write %1, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d1, d0)} : vector<3x7xf32>, memref<?x?xf32>, index, index
|
||||
// CHECK: vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d0]]} : vector<128xf32>, memref<?x?xf32>
|
||||
vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
|
||||
// CHECK: vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d1d0]]} : vector<3x7xf32>, memref<?x?xf32>
|
||||
vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d1, d0)} : vector<3x7xf32>, memref<?x?xf32>
|
||||
return
|
||||
}
|
||||
|
|
|
@ -296,180 +296,181 @@ func @func_with_ops(tensor<?xi1>, tensor<42xi32>, tensor<42xi32>) {
|
|||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_read(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_read(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// expected-error@+1 {{expected 4 operand types but had 3}}
|
||||
%0 = "vector_transfer_read"(%arg0, %c3, %c3, %c3) : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
// expected-error@+1 {{expected 2 types}}
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3] : memref<?x?xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_read(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_read(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// expected-error@+1 {{requires 3 operands}}
|
||||
%0 = vector_transfer_read %arg0, %c3, %c3, %c3 : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
// expected-error@+1 {{expected 2 indices to the memref}}
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3, %c3] : memref<?x?xf32>, vector<128xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_read(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_read(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// expected-error@+1 {{requires an AffineMapAttr named 'permutation_map'}}
|
||||
%0 = vector_transfer_read %arg0, %c3, %c3 : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3] : memref<?x?xf32>, vector<128xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_read(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_read(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// expected-error@+1 {{requires an AffineMapAttr named 'permutation_map'}}
|
||||
%0 = vector_transfer_read %arg0, %c3, %c3 {perm: (d0)->(d0)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3] {perm: (d0)->(d0)} : memref<?x?xf32>, vector<128xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_read(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_read(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// expected-error@+1 {{requires a permutation_map with input dims of the same rank as the memref type}}
|
||||
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0)->(d0)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0)->(d0)} : memref<?x?xf32>, vector<128xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_read(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_read(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// expected-error@+1 {{requires a permutation_map with result dims of the same rank as the vector type}}
|
||||
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0, d1)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0, d1)} : memref<?x?xf32>, vector<128xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_read(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_read(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
|
||||
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0 + d1)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0 + d1)} : memref<?x?xf32>, vector<128xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_read(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_read(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
|
||||
%0 = vector_transfer_read %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0 + 1)} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0 + 1)} : memref<?x?xf32>, vector<128xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_read(memref<?x?x?xf32>) {
|
||||
func @test_vector.transfer_read(memref<?x?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant 3.0 : f32
|
||||
// expected-error@+1 {{requires a permutation_map that is a permutation (found one dim used more than once)}}
|
||||
%0 = vector_transfer_read %arg0, %c3, %c3, %c3 {permutation_map: (d0, d1, d2)->(d0, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<3x7xf32>
|
||||
%0 = vector.transfer_read %arg0[%c3, %c3, %c3] {permutation_map: (d0, d1, d2)->(d0, d0)} : memref<?x?x?xf32>, vector<3x7xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_write(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_write(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
|
||||
// expected-error@+1 {{expected 5 operand types but had 4}}
|
||||
%0 = "vector_transfer_write"(%cst, %arg0, %c3, %c3, %c3) : (vector<128xf32>, memref<?x?xf32>, index, index) -> ()
|
||||
%0 = "vector.transfer_write"(%cst, %arg0, %c3, %c3, %c3) : (vector<128xf32>, memref<?x?xf32>, index, index) -> ()
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_write(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_write(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
|
||||
// expected-error@+1 {{requires number of operands and input types to match}}
|
||||
vector_transfer_write %cst, %arg0, %c3, %c3, %c3 : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
// expected-error@+1 {{expects 4 operands (of which 2 indices)}}
|
||||
vector.transfer_write %cst, %arg0[%c3, %c3, %c3] : vector<128xf32>, memref<?x?xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_write(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_write(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
|
||||
// expected-error@+1 {{requires an AffineMapAttr named 'permutation_map'}}
|
||||
vector_transfer_write %cst, %arg0, %c3, %c3 : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
vector.transfer_write %cst, %arg0[%c3, %c3] : vector<128xf32>, memref<?x?xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_write(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_write(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
|
||||
// expected-error@+1 {{requires an AffineMapAttr named 'permutation_map'}}
|
||||
vector_transfer_write %cst, %arg0, %c3, %c3 {perm: (d0)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
vector.transfer_write %cst, %arg0[%c3, %c3] {perm: (d0)->(d0)} : vector<128xf32>, memref<?x?xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_write(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_write(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
|
||||
// expected-error@+1 {{requires a permutation_map with input dims of the same rank as the memref type}}
|
||||
vector_transfer_write %cst, %arg0, %c3, %c3 {permutation_map: (d0)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map: (d0)->(d0)} : vector<128xf32>, memref<?x?xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_write(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_write(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
|
||||
// expected-error@+1 {{requires a permutation_map with result dims of the same rank as the vector type}}
|
||||
vector_transfer_write %cst, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0, d1)} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0, d1)} : vector<128xf32>, memref<?x?xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_write(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_write(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
|
||||
// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
|
||||
vector_transfer_write %cst, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0 + d1)} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0 + d1)} : vector<128xf32>, memref<?x?xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_write(memref<?x?xf32>) {
|
||||
func @test_vector.transfer_write(memref<?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant splat<vector<128 x f32>, 3.0> : vector<128 x f32>
|
||||
// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
|
||||
vector_transfer_write %cst, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0 + 1)} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0 + 1)} : vector<128xf32>, memref<?x?xf32>
|
||||
}
|
||||
// -----
|
||||
|
||||
func @test_vector_transfer_write(memref<?x?x?xf32>) {
|
||||
func @test_vector.transfer_write(memref<?x?x?xf32>) {
|
||||
^bb0(%arg0: memref<?x?x?xf32>):
|
||||
%c3 = constant 3 : index
|
||||
%cst = constant splat<vector<3 x 7 x f32>, 3.0> : vector<3 x 7 x f32>
|
||||
// expected-error@+1 {{requires a permutation_map that is a permutation (found one dim used more than once)}}
|
||||
vector_transfer_write %cst, %arg0, %c3, %c3, %c3 {permutation_map: (d0, d1, d2)->(d0, d0)} : vector<3x7xf32>, memref<?x?x?xf32>, index, index, index
|
||||
vector.transfer_write %cst, %arg0[%c3, %c3, %c3] {permutation_map: (d0, d1, d2)->(d0, d0)} : vector<3x7xf32>, memref<?x?x?xf32>
|
||||
}
|
||||
|
||||
// -----
|
||||
|
|
|
@ -8,13 +8,13 @@ func @materialize_read_1d() {
|
|||
%A = alloc () : memref<7x42xf32>
|
||||
affine.for %i0 = 0 to 7 step 4 {
|
||||
affine.for %i1 = 0 to 42 step 4 {
|
||||
%f1 = vector_transfer_read %A, %i0, %i1 {permutation_map: (d0, d1) -> (d0)} : (memref<7x42xf32>, index, index) -> vector<4xf32>
|
||||
%f1 = vector.transfer_read %A[%i0, %i1] {permutation_map: (d0, d1) -> (d0)} : memref<7x42xf32>, vector<4xf32>
|
||||
%ip1 = affine.apply (d0) -> (d0 + 1) (%i1)
|
||||
%f2 = vector_transfer_read %A, %i0, %ip1 {permutation_map: (d0, d1) -> (d0)} : (memref<7x42xf32>, index, index) -> vector<4xf32>
|
||||
%f2 = vector.transfer_read %A[%i0, %ip1] {permutation_map: (d0, d1) -> (d0)} : memref<7x42xf32>, vector<4xf32>
|
||||
%ip2 = affine.apply (d0) -> (d0 + 2) (%i1)
|
||||
%f3 = vector_transfer_read %A, %i0, %ip2 {permutation_map: (d0, d1) -> (d0)} : (memref<7x42xf32>, index, index) -> vector<4xf32>
|
||||
%f3 = vector.transfer_read %A[%i0, %ip2] {permutation_map: (d0, d1) -> (d0)} : memref<7x42xf32>, vector<4xf32>
|
||||
%ip3 = affine.apply (d0) -> (d0 + 3) (%i1)
|
||||
%f4 = vector_transfer_read %A, %i0, %ip3 {permutation_map: (d0, d1) -> (d0)} : (memref<7x42xf32>, index, index) -> vector<4xf32>
|
||||
%f4 = vector.transfer_read %A[%i0, %ip3] {permutation_map: (d0, d1) -> (d0)} : memref<7x42xf32>, vector<4xf32>
|
||||
// Both accesses in the load must be clipped otherwise %i1 + 2 and %i1 + 3 will go out of bounds.
|
||||
// CHECK: {{.*}} = select
|
||||
// CHECK: %[[FILTERED1:.*]] = select
|
||||
|
@ -34,9 +34,9 @@ func @materialize_read_1d_partially_specialized(%dyn1 : index, %dyn2 : index, %d
|
|||
affine.for %i2 = 0 to %dyn2 {
|
||||
affine.for %i3 = 0 to 42 step 2 {
|
||||
affine.for %i4 = 0 to %dyn4 {
|
||||
%f1 = vector_transfer_read %A, %i0, %i1, %i2, %i3, %i4 {permutation_map: (d0, d1, d2, d3, d4) -> (d3)} : ( memref<7x?x?x42x?xf32>, index, index, index, index, index) -> vector<4xf32>
|
||||
%f1 = vector.transfer_read %A[%i0, %i1, %i2, %i3, %i4] {permutation_map: (d0, d1, d2, d3, d4) -> (d3)} : memref<7x?x?x42x?xf32>, vector<4xf32>
|
||||
%i3p1 = affine.apply (d0) -> (d0 + 1) (%i3)
|
||||
%f2 = vector_transfer_read %A, %i0, %i1, %i2, %i3p1, %i4 {permutation_map: (d0, d1, d2, d3, d4) -> (d3)} : ( memref<7x?x?x42x?xf32>, index, index, index, index, index) -> vector<4xf32>
|
||||
%f2 = vector.transfer_read %A[%i0, %i1, %i2, %i3p1, %i4] {permutation_map: (d0, d1, d2, d3, d4) -> (d3)} : memref<7x?x?x42x?xf32>, vector<4xf32>
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -63,7 +63,7 @@ func @materialize_read(%M: index, %N: index, %O: index, %P: index) {
|
|||
// CHECK-NEXT: %[[D2:.*]] = dim %0, 2 : memref<?x?x?x?xf32>
|
||||
// CHECK-NEXT: %[[D3:.*]] = dim %0, 3 : memref<?x?x?x?xf32>
|
||||
// CHECK: %[[ALLOC:.*]] = alloc() : memref<5x4x3xf32>
|
||||
// CHECK-NEXT: %[[VECTOR_VIEW:.*]] = vector_type_cast %[[ALLOC]] : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
|
||||
// CHECK-NEXT: %[[VECTOR_VIEW:.*]] = vector.type_cast %[[ALLOC]] : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
|
||||
// CHECK-NEXT: affine.for %[[I4:.*]] = 0 to 3 {
|
||||
// CHECK-NEXT: affine.for %[[I5:.*]] = 0 to 4 {
|
||||
// CHECK-NEXT: affine.for %[[I6:.*]] = 0 to 5 {
|
||||
|
@ -121,7 +121,7 @@ func @materialize_read(%M: index, %N: index, %O: index, %P: index) {
|
|||
affine.for %i1 = 0 to %N {
|
||||
affine.for %i2 = 0 to %O {
|
||||
affine.for %i3 = 0 to %P step 5 {
|
||||
%f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32>
|
||||
%f = vector.transfer_read %A[%i0, %i1, %i2, %i3] {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : memref<?x?x?x?xf32>, vector<5x4x3xf32>
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -142,7 +142,7 @@ func @materialize_write(%M: index, %N: index, %O: index, %P: index) {
|
|||
// CHECK-NEXT: %[[D2:.*]] = dim %0, 2 : memref<?x?x?x?xf32>
|
||||
// CHECK-NEXT: %[[D3:.*]] = dim %0, 3 : memref<?x?x?x?xf32>
|
||||
// CHECK: %[[ALLOC:.*]] = alloc() : memref<5x4x3xf32>
|
||||
// CHECK-NEXT: %[[VECTOR_VIEW:.*]] = vector_type_cast {{.*}} : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
|
||||
// CHECK-NEXT: %[[VECTOR_VIEW:.*]] = vector.type_cast {{.*}} : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
|
||||
// CHECK: store %cst, {{.*}} : memref<1xvector<5x4x3xf32>>
|
||||
// CHECK-NEXT: affine.for %[[I4:.*]] = 0 to 3 {
|
||||
// CHECK-NEXT: affine.for %[[I5:.*]] = 0 to 4 {
|
||||
|
@ -205,7 +205,7 @@ func @materialize_write(%M: index, %N: index, %O: index, %P: index) {
|
|||
affine.for %i1 = 0 to %N step 4 {
|
||||
affine.for %i2 = 0 to %O {
|
||||
affine.for %i3 = 0 to %P step 5 {
|
||||
vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index
|
||||
vector.transfer_write %f1, %A[%i0, %i1, %i2, %i3] {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32>
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -18,18 +18,18 @@ func @materialize(%M : index, %N : index, %O : index, %P : index) {
|
|||
// CHECK-NEXT: %[[b:[0-9]+]] = {{.*}}[[ID1]](%i1)
|
||||
// CHECK-NEXT: %[[c:[0-9]+]] = {{.*}}[[ID1]](%i2)
|
||||
// CHECK-NEXT: %[[d:[0-9]+]] = {{.*}}[[ID1]](%i3)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, %0, %[[a]], %[[b]], %[[c]], %[[d]] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>, index, index, index, index
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, %0[%[[a]], %[[b]], %[[c]], %[[d]]] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>
|
||||
// CHECK: %[[b1:[0-9]+]] = {{.*}}[[D0P1]](%i1)
|
||||
// CHECK: vector_transfer_write {{.*}}, %0, {{.*}}, %[[b1]], {{.*}} {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>, index, index, index, index
|
||||
// CHECK: vector.transfer_write {{.*}}, %0[{{.*}}, %[[b1]], {{.*}}] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>
|
||||
// CHECK: %[[b2:[0-9]+]] = {{.*}}[[D0P2]](%i1)
|
||||
// CHECK: vector_transfer_write {{.*}}, %0, {{.*}}, %[[b2]], {{.*}} {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>, index, index, index, index
|
||||
// CHECK: vector.transfer_write {{.*}}, %0[{{.*}}, %[[b2]], {{.*}}] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>
|
||||
// CHECK: %[[b3:[0-9]+]] = {{.*}}[[D0P3]](%i1)
|
||||
// CHECK: vector_transfer_write {{.*}}, %0, {{.*}}, %[[b3]], {{.*}} {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>, index, index, index, index
|
||||
// CHECK: vector.transfer_write {{.*}}, %0[{{.*}}, %[[b3]], {{.*}}] {permutation_map: #[[D0D1D2D3TOD1D0]]} : vector<4x4xf32>, memref<?x?x?x?xf32>
|
||||
affine.for %i0 = 0 to %M step 4 {
|
||||
affine.for %i1 = 0 to %N step 4 {
|
||||
affine.for %i2 = 0 to %O {
|
||||
affine.for %i3 = 0 to %P step 4 {
|
||||
"vector_transfer_write"(%f1, %A, %i0, %i1, %i2, %i3) {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : (vector<4x4x4xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index) -> ()
|
||||
vector.transfer_write %f1, %A[%i0, %i1, %i2, %i3] {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<4x4x4xf32>, memref<?x?x?x?xf32, 0>
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -17,22 +17,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
// 4x unroll (jammed by construction).
|
||||
// CHECK: affine.for %i0 = 0 to %arg0 {
|
||||
// CHECK-NEXT: affine.for %i1 = 0 to %arg1 step 32 {
|
||||
// CHECK-NEXT: [[CST0:%.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: [[CST1:%.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: [[CST2:%.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: [[CST3:%.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: vector_transfer_write [[CST0]], {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P8]]{{.*}}
|
||||
// CHECK-NEXT: vector_transfer_write [[CST1]], {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL20:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: [[VAL21:%.*]] = affine.apply [[D0P16]]{{.*}}
|
||||
// CHECK-NEXT: vector_transfer_write [[CST2]], {{.*}}, [[VAL20]], [[VAL21]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL30:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: [[VAL31:%.*]] = affine.apply [[D0P24]]{{.*}}
|
||||
// CHECK-NEXT: vector_transfer_write [[CST3]], {{.*}}, [[VAL30]], [[VAL31]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: %[[CST0:.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: %[[CST1:.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: %[[CST2:.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: %[[CST3:.*]] = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: vector.transfer_write %[[CST0]], {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P8]]{{.*}}
|
||||
// CHECK-NEXT: vector.transfer_write %[[CST1]], {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL20:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: %[[VAL21:.*]] = affine.apply [[D0P16]]{{.*}}
|
||||
// CHECK-NEXT: vector.transfer_write %[[CST2]], {{.*}}[%[[VAL20]], %[[VAL21]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL30:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: %[[VAL31:.*]] = affine.apply [[D0P24]]{{.*}}
|
||||
// CHECK-NEXT: vector.transfer_write %[[CST3]], {{.*}}[%[[VAL30]], %[[VAL31]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
//
|
||||
affine.for %i0 = 0 to %M {
|
||||
affine.for %i1 = 0 to %N {
|
||||
|
@ -43,22 +43,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
// 4x unroll (jammed by construction).
|
||||
// CHECK: affine.for %i2 = 0 to %arg0 {
|
||||
// CHECK-NEXT: affine.for %i3 = 0 to %arg1 step 32 {
|
||||
// CHECK-NEXT: [[CST0:%.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: [[CST1:%.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: [[CST2:%.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: [[CST3:%.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: vector_transfer_write [[CST0]], {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P8]]{{.*}}
|
||||
// CHECK-NEXT: vector_transfer_write [[CST1]], {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL20:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: [[VAL21:%.*]] = affine.apply [[D0P16]]{{.*}}
|
||||
// CHECK-NEXT: vector_transfer_write [[CST2]], {{.*}}, [[VAL20]], [[VAL21]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL30:%.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: [[VAL31:%.*]] = affine.apply [[D0P24]]{{.*}}
|
||||
// CHECK-NEXT: vector_transfer_write [[CST3]], {{.*}}, [[VAL30]], [[VAL31]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: %[[CST0:.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: %[[CST1:.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: %[[CST2:.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: %[[CST3:.*]] = constant splat<vector<8xf32>, 2.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: vector.transfer_write %[[CST0]], {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P8]]{{.*}}
|
||||
// CHECK-NEXT: vector.transfer_write %[[CST1]], {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL20:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: %[[VAL21:.*]] = affine.apply [[D0P16]]{{.*}}
|
||||
// CHECK-NEXT: vector.transfer_write %[[CST2]], {{.*}}[%[[VAL20]], %[[VAL21]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL30:.*]] = affine.apply [[ID1]]{{.*}}
|
||||
// CHECK-NEXT: %[[VAL31:.*]] = affine.apply [[D0P24]]{{.*}}
|
||||
// CHECK-NEXT: vector.transfer_write %[[CST3]], {{.*}}[%[[VAL30]], %[[VAL31]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
//
|
||||
affine.for %i2 = 0 to %M {
|
||||
affine.for %i3 = 0 to %N {
|
||||
|
@ -71,44 +71,44 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
// CHECK-NEXT: affine.for %i5 = 0 to %arg1 step 32 {
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
|
||||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
|
||||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
|
||||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
//
|
||||
affine.for %i4 = 0 to %M {
|
||||
affine.for %i5 = 0 to %N {
|
||||
|
|
|
@ -23,24 +23,24 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
// CHECK-NEXT: {{.*}} = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: {{.*}} = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: {{.*}} = constant splat<vector<8xf32>, 1.000000e+00> : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]](%i0)
|
||||
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]](%i1)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]](%i0)
|
||||
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P8]](%i1)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL20:%.*]] = affine.apply [[D0P1]](%i0)
|
||||
// CHECK-NEXT: [[VAL21:%.*]] = affine.apply [[ID1]](%i1)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL20]], [[VAL21]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL30:%.*]] = affine.apply [[D0P1]](%i0)
|
||||
// CHECK-NEXT: [[VAL31:%.*]] = affine.apply [[D0P8]](%i1)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL30]], [[VAL31]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL40:%.*]] = affine.apply [[D0P2]](%i0)
|
||||
// CHECK-NEXT: [[VAL41:%.*]] = affine.apply [[ID1]](%i1)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL40]], [[VAL41]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: [[VAL50:%.*]] = affine.apply [[D0P2]](%i0)
|
||||
// CHECK-NEXT: [[VAL51:%.*]] = affine.apply [[D0P8]](%i1)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL50]], [[VAL51]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>
|
||||
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]](%i0)
|
||||
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]](%i1)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]](%i0)
|
||||
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P8]](%i1)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL20:.*]] = affine.apply [[D0P1]](%i0)
|
||||
// CHECK-NEXT: %[[VAL21:.*]] = affine.apply [[ID1]](%i1)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL20]], %[[VAL21]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL30:.*]] = affine.apply [[D0P1]](%i0)
|
||||
// CHECK-NEXT: %[[VAL31:.*]] = affine.apply [[D0P8]](%i1)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL30]], %[[VAL31]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL40:.*]] = affine.apply [[D0P2]](%i0)
|
||||
// CHECK-NEXT: %[[VAL41:.*]] = affine.apply [[ID1]](%i1)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL40]], %[[VAL41]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL50:.*]] = affine.apply [[D0P2]](%i0)
|
||||
// CHECK-NEXT: %[[VAL51:.*]] = affine.apply [[D0P8]](%i1)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL50]], %[[VAL51]]] {permutation_map: [[D0D1TOD1]]} : vector<8xf32>, memref<?x?xf32>
|
||||
affine.for %i0 = 0 to %M {
|
||||
affine.for %i1 = 0 to %N {
|
||||
// non-scoped %f1
|
||||
|
@ -63,40 +63,40 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
// CHECK-NEXT: affine.for %i5 = 0 to %arg1 step 16 {
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
|
||||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
|
||||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
|
||||
|
@ -105,22 +105,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<8xf32>
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
//
|
||||
affine.for %i4 = 0 to %M {
|
||||
affine.for %i5 = 0 to %N {
|
||||
|
|
|
@ -17,12 +17,12 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
// CHECK-NEXT: affine.for %i1 = 0 to %arg1 step 32 {
|
||||
// CHECK-NEXT: {{.*}} = constant splat<vector<3x16xf32>, 1.000000e+00> : vector<3x16xf32>
|
||||
// CHECK-NEXT: {{.*}} = constant splat<vector<3x16xf32>, 1.000000e+00> : vector<3x16xf32>
|
||||
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]](%i0)
|
||||
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]](%i1)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[ID2]]} : vector<3x16xf32>
|
||||
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]](%i0)
|
||||
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P16]](%i1)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[ID2]]} : vector<3x16xf32>
|
||||
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]](%i0)
|
||||
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]](%i1)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[ID2]]} : vector<3x16xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]](%i0)
|
||||
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P16]](%i1)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[ID2]]} : vector<3x16xf32>, memref<?x?xf32>
|
||||
//
|
||||
affine.for %i0 = 0 to %M {
|
||||
affine.for %i1 = 0 to %N {
|
||||
|
@ -35,12 +35,12 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
// CHECK-NEXT: affine.for %i3 = 0 to %arg1 step 32 {
|
||||
// CHECK-NEXT: {{.*}} = constant splat<vector<3x16xf32>, 2.000000e+00> : vector<3x16xf32>
|
||||
// CHECK-NEXT: {{.*}} = constant splat<vector<3x16xf32>, 2.000000e+00> : vector<3x16xf32>
|
||||
// CHECK-NEXT: [[VAL00:%.*]] = affine.apply [[ID1]](%i2)
|
||||
// CHECK-NEXT: [[VAL01:%.*]] = affine.apply [[ID1]](%i3)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL00]], [[VAL01]] {permutation_map: [[ID2]]} : vector<3x16xf32>
|
||||
// CHECK-NEXT: [[VAL10:%.*]] = affine.apply [[ID1]](%i2)
|
||||
// CHECK-NEXT: [[VAL11:%.*]] = affine.apply [[D0P16]](%i3)
|
||||
// CHECK-NEXT: vector_transfer_write {{.*}}, {{.*}}, [[VAL10]], [[VAL11]] {permutation_map: [[ID2]]} : vector<3x16xf32>
|
||||
// CHECK-NEXT: %[[VAL00:.*]] = affine.apply [[ID1]](%i2)
|
||||
// CHECK-NEXT: %[[VAL01:.*]] = affine.apply [[ID1]](%i3)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL00]], %[[VAL01]]] {permutation_map: [[ID2]]} : vector<3x16xf32>, memref<?x?xf32>
|
||||
// CHECK-NEXT: %[[VAL10:.*]] = affine.apply [[ID1]](%i2)
|
||||
// CHECK-NEXT: %[[VAL11:.*]] = affine.apply [[D0P16]](%i3)
|
||||
// CHECK-NEXT: vector.transfer_write {{.*}}, {{.*}}[%[[VAL10]], %[[VAL11]]] {permutation_map: [[ID2]]} : vector<3x16xf32>, memref<?x?xf32>
|
||||
//
|
||||
affine.for %i2 = 0 to %M {
|
||||
affine.for %i3 = 0 to %N {
|
||||
|
@ -53,24 +53,24 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
// CHECK-NEXT: affine.for %i5 = 0 to %arg1 step 32 {
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read
|
||||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<3x16xf32>
|
||||
// CHECK-NEXT: {{.*}} = addf {{.*}} : vector<3x16xf32>
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: {{.*}} = affine.apply
|
||||
// CHECK-NEXT: vector_transfer_write
|
||||
// CHECK-NEXT: vector.transfer_write
|
||||
//
|
||||
affine.for %i4 = 0 to %M {
|
||||
affine.for %i5 = 0 to %N {
|
||||
|
|
|
@ -13,7 +13,7 @@
|
|||
// Maps introduced to vectorize fastest varying memory index.
|
||||
// CHECK-LABEL: func @vec1d_1
|
||||
func @vec1d_1(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
||||
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
|
||||
|
@ -23,7 +23,7 @@ func @vec1d_1(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
%cst0 = constant 0 : index
|
||||
//
|
||||
// CHECK: for {{.*}} step 128
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read %arg0, [[C0]], [[C0]] {permutation_map: #[[map_proj_d0d1_0]]} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read %arg0[%[[C0]], %[[C0]]] {permutation_map: #[[map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
|
||||
affine.for %i0 = 0 to %M { // vectorized due to scalar -> vector
|
||||
%a0 = load %A[%cst0, %cst0] : memref<?x?xf32>
|
||||
}
|
||||
|
@ -32,7 +32,7 @@ func @vec1d_1(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
|
||||
// CHECK-LABEL: func @vec1d_2
|
||||
func @vec1d_2(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
||||
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
|
||||
|
@ -42,8 +42,8 @@ func @vec1d_2(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
%cst0 = constant 0 : index
|
||||
//
|
||||
// CHECK:for [[IV3:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128
|
||||
// CHECK-NEXT: [[APP3:%[a-zA-Z0-9]+]] = affine.apply {{.*}}[[IV3]]
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read %arg0, [[C0]], [[APP3]] {permutation_map: #[[map_proj_d0d1_d1]]} : {{.*}} -> vector<128xf32>
|
||||
// CHECK-NEXT: %[[APP3:[a-zA-Z0-9]+]] = affine.apply {{.*}}[[IV3]]
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read %arg0[%[[C0]], %[[APP3]]] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
|
||||
affine.for %i3 = 0 to %M { // vectorized
|
||||
%r3 = affine.apply (d0) -> (d0) (%i3)
|
||||
%a3 = load %A[%cst0, %r3] : memref<?x?xf32>
|
||||
|
@ -53,7 +53,7 @@ func @vec1d_2(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
|
||||
// CHECK-LABEL: func @vec1d_3
|
||||
func @vec1d_3(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
||||
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
|
||||
|
@ -64,9 +64,9 @@ func @vec1d_3(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
//
|
||||
// CHECK:for [[IV8:%[i0-9]+]] = 0 to [[ARG_M]] step 128
|
||||
// CHECK-NEXT: for [[IV9:%[i0-9]*]] = 0 to [[ARG_N]] {
|
||||
// CHECK-NEXT: [[APP9_0:%[0-9]+]] = affine.apply {{.*}}([[IV8]], [[IV9]])
|
||||
// CHECK-NEXT: [[APP9_1:%[0-9]+]] = affine.apply {{.*}}([[IV8]], [[IV9]])
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read %arg0, [[APP9_0]], [[APP9_1]] {permutation_map: #[[map_proj_d0d1_d1]]} : {{.*}} -> vector<128xf32>
|
||||
// CHECK-NEXT: %[[APP9_0:[0-9]+]] = affine.apply {{.*}}([[IV8]], [[IV9]])
|
||||
// CHECK-NEXT: %[[APP9_1:[0-9]+]] = affine.apply {{.*}}([[IV8]], [[IV9]])
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read %arg0[%[[APP9_0]], %[[APP9_1]]] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
|
||||
affine.for %i8 = 0 to %M { // vectorized
|
||||
affine.for %i9 = 0 to %N {
|
||||
%r90 = affine.apply (d0, d1) -> (d1) (%i8, %i9)
|
||||
|
@ -87,7 +87,7 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
affine.for %i0 = 0 to %M {
|
||||
affine.for %i1 = 0 to %N {
|
||||
// CHECK: [[C1:%.*]] = constant splat<vector<128xf32>, 1.000000e+00> : vector<128xf32>
|
||||
// CHECK: vector_transfer_write [[C1]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
// CHECK: vector.transfer_write [[C1]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>
|
||||
// non-scoped %f1
|
||||
store %f1, %A[%i0, %i1] : memref<?x?xf32, 0>
|
||||
}
|
||||
|
@ -95,22 +95,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
affine.for %i2 = 0 to %M {
|
||||
affine.for %i3 = 0 to %N {
|
||||
// CHECK: [[C3:%.*]] = constant splat<vector<128xf32>, 2.000000e+00> : vector<128xf32>
|
||||
// CHECK: vector_transfer_write [[C3]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
// CHECK: vector.transfer_write [[C3]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>
|
||||
// non-scoped %f2
|
||||
store %f2, %B[%i2, %i3] : memref<?x?xf32, 0>
|
||||
}
|
||||
}
|
||||
affine.for %i4 = 0 to %M {
|
||||
affine.for %i5 = 0 to %N {
|
||||
// CHECK: [[A5:%.*]] = vector_transfer_read %0, {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
// CHECK: [[B5:%.*]] = vector_transfer_read %1, {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : (memref<?x?xf32>, index, index) -> vector<128xf32>
|
||||
// CHECK: [[A5:%.*]] = vector.transfer_read %0[{{.*}}] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
|
||||
// CHECK: [[B5:%.*]] = vector.transfer_read %1[{{.*}}] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
|
||||
// CHECK: [[S5:%.*]] = addf [[A5]], [[B5]] : vector<128xf32>
|
||||
// CHECK: [[SPLAT1:%.*]] = constant splat<vector<128xf32>, 1.000000e+00> : vector<128xf32>
|
||||
// CHECK: [[S6:%.*]] = addf [[S5]], [[SPLAT1]] : vector<128xf32>
|
||||
// CHECK: [[SPLAT2:%.*]] = constant splat<vector<128xf32>, 2.000000e+00> : vector<128xf32>
|
||||
// CHECK: [[S7:%.*]] = addf [[S5]], [[SPLAT2]] : vector<128xf32>
|
||||
// CHECK: [[S8:%.*]] = addf [[S7]], [[S6]] : vector<128xf32>
|
||||
// CHECK: vector_transfer_write [[S8]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>, index, index
|
||||
// CHECK: vector.transfer_write [[S8]], {{.*}} {permutation_map: #[[map_proj_d0d1_d1]]} : vector<128xf32>, memref<?x?xf32>
|
||||
%a5 = load %A[%i4, %i5] : memref<?x?xf32, 0>
|
||||
%b5 = load %B[%i4, %i5] : memref<?x?xf32, 0>
|
||||
%s5 = addf %a5, %b5 : f32
|
||||
|
@ -179,9 +179,9 @@ func @vec_rejected_3(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
//
|
||||
// CHECK:for [[IV4:%[i0-9]+]] = 0 to [[ARG_M]] step 128 {
|
||||
// CHECK-NEXT: for [[IV5:%[i0-9]*]] = 0 to [[ARG_N]] {
|
||||
// CHECK-NEXT: [[APP50:%[0-9]+]] = affine.apply {{.*}}([[IV4]], [[IV5]])
|
||||
// CHECK-NEXT: [[APP51:%[0-9]+]] = affine.apply {{.*}}([[IV4]], [[IV5]])
|
||||
// CHECK-NEXT: {{.*}} = vector_transfer_read %arg0, [[APP50]], [[APP51]] {permutation_map: #[[map_proj_d0d1_d1]]} : {{.*}} -> vector<128xf32>
|
||||
// CHECK-NEXT: %[[APP50:[0-9]+]] = affine.apply {{.*}}([[IV4]], [[IV5]])
|
||||
// CHECK-NEXT: %[[APP51:[0-9]+]] = affine.apply {{.*}}([[IV4]], [[IV5]])
|
||||
// CHECK-NEXT: {{.*}} = vector.transfer_read %arg0[%[[APP50]], %[[APP51]]] {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
|
||||
affine.for %i4 = 0 to %M { // vectorized
|
||||
affine.for %i5 = 0 to %N { // not vectorized, would vectorize with --test-fastest-varying=1
|
||||
%r50 = affine.apply (d0, d1) -> (d1) (%i4, %i5)
|
||||
|
@ -289,7 +289,7 @@ func @vec_rejected_7(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
|
||||
// CHECK-LABEL: func @vec_rejected_8
|
||||
func @vec_rejected_8(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
||||
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
|
||||
|
@ -300,7 +300,7 @@ func @vec_rejected_8(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
//
|
||||
// CHECK: affine.for %i{{[0-9]*}} = 0 to %{{[0-9]*}} {
|
||||
// CHECK: for [[IV18:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128
|
||||
// CHECK: {{.*}} = vector_transfer_read %arg0, [[C0]], [[C0]] {permutation_map: #[[map_proj_d0d1_0]]} : {{.*}} -> vector<128xf32>
|
||||
// CHECK: {{.*}} = vector.transfer_read %arg0[%[[C0]], %[[C0]]] {permutation_map: #[[map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
|
||||
affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %i18 in DFS post-order prevents vectorizing %i17
|
||||
affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
|
||||
%a18 = load %A[%cst0, %cst0] : memref<?x?xf32>
|
||||
|
@ -311,7 +311,7 @@ func @vec_rejected_8(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
|
||||
// CHECK-LABEL: func @vec_rejected_9
|
||||
func @vec_rejected_9(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
||||
// CHECK-DAG: [[C0:%[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: %[[C0:[a-z0-9_]+]] = constant 0 : index
|
||||
// CHECK-DAG: [[ARG_M:%[0-9]+]] = dim %arg0, 0 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_N:%[0-9]+]] = dim %arg0, 1 : memref<?x?xf32>
|
||||
// CHECK-DAG: [[ARG_P:%[0-9]+]] = dim %arg1, 2 : memref<?x?x?xf32>
|
||||
|
@ -322,7 +322,7 @@ func @vec_rejected_9(%A : memref<?x?xf32>, %B : memref<?x?x?xf32>) {
|
|||
//
|
||||
// CHECK: affine.for %i{{[0-9]*}} = 0 to %{{[0-9]*}} {
|
||||
// CHECK: for [[IV18:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128
|
||||
// CHECK: {{.*}} = vector_transfer_read %arg0, [[C0]], [[C0]] {permutation_map: #[[map_proj_d0d1_0]]} : {{.*}} -> vector<128xf32>
|
||||
// CHECK: {{.*}} = vector.transfer_read %arg0[%[[C0]], %[[C0]]] {permutation_map: #[[map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
|
||||
affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %i18 in DFS post-order prevents vectorizing %i17
|
||||
affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
|
||||
%a18 = load %A[%cst0, %cst0] : memref<?x?xf32>
|
||||
|
|
|
@ -22,7 +22,7 @@ func @vec2d(%A : memref<?x?x?xf32>) {
|
|||
// affine.for %i0 = 0 to %0 {
|
||||
// affine.for %i1 = 0 to %1 step 32 {
|
||||
// affine.for %i2 = 0 to %2 step 256 {
|
||||
// %3 = "vector_transfer_read"(%arg0, %i0, %i1, %i2) : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// %3 = "vector.transfer_read"(%arg0, %i0, %i1, %i2) : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
affine.for %i0 = 0 to %M {
|
||||
affine.for %i1 = 0 to %N {
|
||||
affine.for %i2 = 0 to %P {
|
||||
|
@ -54,7 +54,7 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
affine.for %i0 = 0 to %M {
|
||||
affine.for %i1 = 0 to %N {
|
||||
// CHECK: [[C1:%.*]] = constant splat<vector<32x256xf32>, 1.000000e+00> : vector<32x256xf32>
|
||||
// CHECK: vector_transfer_write [[C1]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>, index, index
|
||||
// CHECK: vector.transfer_write [[C1]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>
|
||||
// non-scoped %f1
|
||||
store %f1, %A[%i0, %i1] : memref<?x?xf32, 0>
|
||||
}
|
||||
|
@ -62,22 +62,22 @@ func @vector_add_2d(%M : index, %N : index) -> f32 {
|
|||
affine.for %i2 = 0 to %M {
|
||||
affine.for %i3 = 0 to %N {
|
||||
// CHECK: [[C3:%.*]] = constant splat<vector<32x256xf32>, 2.000000e+00> : vector<32x256xf32>
|
||||
// CHECK: vector_transfer_write [[C3]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>, index, index
|
||||
// CHECK: vector.transfer_write [[C3]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>
|
||||
// non-scoped %f2
|
||||
store %f2, %B[%i2, %i3] : memref<?x?xf32, 0>
|
||||
}
|
||||
}
|
||||
affine.for %i4 = 0 to %M {
|
||||
affine.for %i5 = 0 to %N {
|
||||
// CHECK: [[A5:%.*]] = vector_transfer_read %0, {{.*}} {permutation_map: #[[map_id2]]} : (memref<?x?xf32>, index, index) -> vector<32x256xf32>
|
||||
// CHECK: [[B5:%.*]] = vector_transfer_read %1, {{.*}} {permutation_map: #[[map_id2]]} : (memref<?x?xf32>, index, index) -> vector<32x256xf32>
|
||||
// CHECK: [[A5:%.*]] = vector.transfer_read %0[{{.*}}] {permutation_map: #[[map_id2]]} : memref<?x?xf32>, vector<32x256xf32>
|
||||
// CHECK: [[B5:%.*]] = vector.transfer_read %1[{{.*}}] {permutation_map: #[[map_id2]]} : memref<?x?xf32>, vector<32x256xf32>
|
||||
// CHECK: [[S5:%.*]] = addf [[A5]], [[B5]] : vector<32x256xf32>
|
||||
// CHECK: [[SPLAT1:%.*]] = constant splat<vector<32x256xf32>, 1.000000e+00> : vector<32x256xf32>
|
||||
// CHECK: [[S6:%.*]] = addf [[S5]], [[SPLAT1]] : vector<32x256xf32>
|
||||
// CHECK: [[SPLAT2:%.*]] = constant splat<vector<32x256xf32>, 2.000000e+00> : vector<32x256xf32>
|
||||
// CHECK: [[S7:%.*]] = addf [[S5]], [[SPLAT2]] : vector<32x256xf32>
|
||||
// CHECK: [[S8:%.*]] = addf [[S7]], [[S6]] : vector<32x256xf32>
|
||||
// CHECK: vector_transfer_write [[S8]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>, index, index
|
||||
// CHECK: vector.transfer_write [[S8]], {{.*}} {permutation_map: #[[map_id2]]} : vector<32x256xf32>, memref<?x?xf32>
|
||||
//
|
||||
%a5 = load %A[%i4, %i5] : memref<?x?xf32, 0>
|
||||
%b5 = load %B[%i4, %i5] : memref<?x?xf32, 0>
|
||||
|
@ -110,7 +110,7 @@ func @vectorize_matmul(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: me
|
|||
// VECT: {{.*}} #[[map_id1]](%[[M]]) step 4 {
|
||||
// VECT-NEXT: {{.*}} #[[map_id1]](%[[N]]) step 8 {
|
||||
// VECT: %[[VC0:.*]] = constant splat<vector<4x8xf32>, 0.000000e+00> : vector<4x8xf32>
|
||||
// VECT-NEXT: vector_transfer_write %[[VC0]], %arg2, %{{.*}}, %{{.*}} {permutation_map: #[[map_id2]]}
|
||||
// VECT-NEXT: vector.transfer_write %[[VC0]], %arg2[%{{.*}}, %{{.*}}] {permutation_map: #[[map_id2]]} : vector<4x8xf32>, memref<?x?xf32>
|
||||
affine.for %i0 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%M) {
|
||||
affine.for %i1 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%N) {
|
||||
%cst = constant 0.000000e+00 : f32
|
||||
|
@ -120,12 +120,12 @@ func @vectorize_matmul(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: me
|
|||
// VECT: affine.for %[[I2:.*]] = #[[map_id1]](%[[C0]]) to #[[map_id1]](%[[M]]) step 4 {
|
||||
// VECT-NEXT: affine.for %[[I3:.*]] = #[[map_id1]](%[[C0]]) to #[[map_id1]](%[[N]]) step 8 {
|
||||
// VECT-NEXT: affine.for %[[I4:.*]] = #map5(%[[C0]]) to #[[map_id1]](%[[K]]) {
|
||||
// VECT-NEXT: %[[A:.*]] = vector_transfer_read %arg1, %[[I4]], %[[I3]] {permutation_map: #[[map_proj_d0d1_zerod1]]}
|
||||
// VECT-NEXT: %[[B:.*]] = vector_transfer_read %arg0, %[[I2]], %[[I4]] {permutation_map: #[[map_proj_d0d1_d0zero]]}
|
||||
// VECT-NEXT: %[[A:.*]] = vector.transfer_read %arg1[%[[I4]], %[[I3]]] {permutation_map: #[[map_proj_d0d1_zerod1]]} : memref<?x?xf32>, vector<4x8xf32>
|
||||
// VECT-NEXT: %[[B:.*]] = vector.transfer_read %arg0[%[[I2]], %[[I4]]] {permutation_map: #[[map_proj_d0d1_d0zero]]} : memref<?x?xf32>, vector<4x8xf32>
|
||||
// VECT-NEXT: %[[C:.*]] = mulf %[[B]], %[[A]] : vector<4x8xf32>
|
||||
// VECT-NEXT: %[[D:.*]] = vector_transfer_read %arg2, %[[I2]], %[[I3]] {permutation_map: #[[map_id2]]}
|
||||
// VECT-NEXT: %[[D:.*]] = vector.transfer_read %arg2[%[[I2]], %[[I3]]] {permutation_map: #[[map_id2]]} : memref<?x?xf32>, vector<4x8xf32>
|
||||
// VECT-NEXT: %[[E:.*]] = addf %[[D]], %[[C]] : vector<4x8xf32>
|
||||
// VECT-NEXT: vector_transfer_write %[[E]], %arg2, %[[I2]], %[[I3]] {permutation_map: #[[map_id2]]} : vector<4x8xf32>, memref<?x?xf32>, index, index
|
||||
// VECT-NEXT: vector.transfer_write %[[E]], %arg2[%[[I2]], %[[I3]]] {permutation_map: #[[map_id2]]} : vector<4x8xf32>, memref<?x?xf32>
|
||||
affine.for %i2 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%M) {
|
||||
affine.for %i3 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%N) {
|
||||
affine.for %i4 = (d0) -> (d0)(%c0) to (d0) -> (d0)(%K) {
|
||||
|
|
|
@ -12,7 +12,7 @@ func @vec3d(%A : memref<?x?x?xf32>) {
|
|||
// CHECK: affine.for %i2 = 0 to %0 step 32 {
|
||||
// CHECK: affine.for %i3 = 0 to %1 step 64 {
|
||||
// CHECK: affine.for %i4 = 0 to %2 step 256 {
|
||||
// CHECK: %3 = vector_transfer_read %arg0, %i2, %i3, %i4 {permutation_map: #[[map_proj_d0d1d2_d0d1d2]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32>
|
||||
// CHECK: %3 = vector.transfer_read %arg0[%i2, %i3, %i4] {permutation_map: #[[map_proj_d0d1d2_d0d1d2]]} : memref<?x?x?xf32>, vector<32x64x256xf32>
|
||||
affine.for %t0 = 0 to %0 {
|
||||
affine.for %t1 = 0 to %0 {
|
||||
affine.for %i0 = 0 to %0 {
|
||||
|
|
|
@ -10,7 +10,7 @@ func @vec2d(%A : memref<?x?x?xf32>) {
|
|||
// CHECK: affine.for %i0 = 0 to %0 step 32
|
||||
// CHECK: affine.for %i1 = 0 to %1 {
|
||||
// CHECK: affine.for %i2 = 0 to %2 step 256
|
||||
// CHECK: {{.*}} = vector_transfer_read %arg0, %i0, %i1, %i2 {permutation_map: #[[map_proj_d0d1d2_d0d2]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// CHECK: {{.*}} = vector.transfer_read %arg0[%i0, %i1, %i2] {permutation_map: #[[map_proj_d0d1d2_d0d2]]} : memref<?x?x?xf32>, vector<32x256xf32>
|
||||
affine.for %i0 = 0 to %M {
|
||||
affine.for %i1 = 0 to %N {
|
||||
affine.for %i2 = 0 to %P {
|
||||
|
|
|
@ -22,7 +22,7 @@ func @vec2d(%A : memref<?x?x?xf32>) {
|
|||
// CHECK: affine.for %i3 = 0 to %0 step 32
|
||||
// CHECK: affine.for %i4 = 0 to %1 step 256
|
||||
// CHECK: affine.for %i5 = 0 to %2 {
|
||||
// CHECK: {{.*}} = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// CHECK: {{.*}} = vector.transfer_read %arg0[%i4, %i5, %i3] {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
|
||||
affine.for %i3 = 0 to %M {
|
||||
affine.for %i4 = 0 to %N {
|
||||
affine.for %i5 = 0 to %P {
|
||||
|
@ -40,12 +40,12 @@ func @vec2d_imperfectly_nested(%A : memref<?x?x?xf32>) {
|
|||
// CHECK: affine.for %i0 = 0 to %0 step 32 {
|
||||
// CHECK: affine.for %i1 = 0 to %1 {
|
||||
// CHECK: affine.for %i2 = 0 to %2 step 256 {
|
||||
// CHECK: %3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// CHECK: %3 = vector.transfer_read %arg0[%i2, %i1, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
|
||||
// CHECK: affine.for %i3 = 0 to %1 step 256 {
|
||||
// CHECK: affine.for %i4 = 0 to %2 {
|
||||
// CHECK: %4 = vector_transfer_read %arg0, %i3, %i4, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// CHECK: %4 = vector.transfer_read %arg0[%i3, %i4, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
|
||||
// CHECK: affine.for %i5 = 0 to %2 {
|
||||
// CHECK: %5 = vector_transfer_read %arg0, %i3, %i5, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// CHECK: %5 = vector.transfer_read %arg0[%i3, %i5, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
|
||||
affine.for %i0 = 0 to %0 {
|
||||
affine.for %i1 = 0 to %1 {
|
||||
affine.for %i2 = 0 to %2 {
|
||||
|
|
|
@ -22,7 +22,7 @@ func @vec2d(%A : memref<?x?x?xf32>) {
|
|||
// CHECK: affine.for %i3 = 0 to %0 step 32
|
||||
// CHECK: affine.for %i4 = 0 to %1 {
|
||||
// CHECK: affine.for %i5 = 0 to %2 step 256
|
||||
// CHECK: {{.*}} = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// CHECK: {{.*}} = vector.transfer_read %arg0[%i4, %i5, %i3] {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
|
||||
affine.for %i3 = 0 to %M {
|
||||
affine.for %i4 = 0 to %N {
|
||||
affine.for %i5 = 0 to %P {
|
||||
|
@ -40,12 +40,12 @@ func @vec2d_imperfectly_nested(%A : memref<?x?x?xf32>) {
|
|||
// CHECK: affine.for %i0 = 0 to %0 step 32 {
|
||||
// CHECK: affine.for %i1 = 0 to %1 step 256 {
|
||||
// CHECK: affine.for %i2 = 0 to %2 {
|
||||
// CHECK: %3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// CHECK: %3 = vector.transfer_read %arg0[%i2, %i1, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
|
||||
// CHECK: affine.for %i3 = 0 to %1 {
|
||||
// CHECK: affine.for %i4 = 0 to %2 step 256 {
|
||||
// CHECK: %4 = vector_transfer_read %arg0, %i3, %i4, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// CHECK: %4 = vector.transfer_read %arg0[%i3, %i4, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
|
||||
// CHECK: affine.for %i5 = 0 to %2 step 256 {
|
||||
// CHECK: %5 = vector_transfer_read %arg0, %i3, %i5, %i0 {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
|
||||
// CHECK: %5 = vector.transfer_read %arg0[%i3, %i5, %i0] {permutation_map: #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
|
||||
affine.for %i0 = 0 to %0 {
|
||||
affine.for %i1 = 0 to %1 {
|
||||
affine.for %i2 = 0 to %2 {
|
||||
|
|
Loading…
Reference in New Issue