2018-09-20 12:35:11 +08:00
|
|
|
// RUN: mlir-opt %s | FileCheck %s
|
2018-11-18 00:24:07 +08:00
|
|
|
// Verify the printed output can be parsed.
|
|
|
|
// RUN: mlir-opt %s | mlir-opt | FileCheck %s
|
2019-02-06 03:47:02 +08:00
|
|
|
// Verify the generic form can be parsed.
|
|
|
|
// RUN: mlir-opt -mlir-print-op-generic %s | mlir-opt | FileCheck %s
|
2018-07-25 01:13:31 +08:00
|
|
|
|
2018-08-01 07:21:36 +08:00
|
|
|
// CHECK: #map0 = (d0) -> (d0 + 1)
|
2018-07-29 00:36:25 +08:00
|
|
|
|
2019-01-25 05:04:50 +08:00
|
|
|
// CHECK: #map1 = ()[s0] -> (s0 + 1)
|
[MLIR] Add VectorTransferOps
This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.
VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.
VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.
Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.
VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.
A vector transfer read has semantics similar to a vector load, with additional
support for:
1. an optional value of the elemental type of the MemRef. This value
supports non-effecting padding and is inserted in places where the
vector read exceeds the MemRef bounds. If the value is not specified,
the access is statically guaranteed to be within bounds;
2. an attribute of type AffineMap to specify a slice of the original
MemRef access and its transposition into the super-vector shape. The
permutation_map is an unbounded AffineMap that must represent a
permutation from the MemRef dim space projected onto the vector dim
space.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
...
%val = `ssa-value` : f32
// let %i, %j, %k, %l be ssa-values of type index
%v0 = vector_transfer_read %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index) ->
vector<16x32x64xf32>
%v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index, f32) ->
vector<16x32x64xf32>
```
VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
%val = `ssa-value` : vector<16x32x64xf32>
// let %i, %j, %k, %l be ssa-values of type index
vector_transfer_write %val, %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234
2018-12-04 07:21:27 +08:00
|
|
|
// CHECK-DAG: #[[map_proj_d0d1_d0:map[0-9]+]] = (d0, d1) -> (d0)
|
|
|
|
// CHECK-DAG: #[[map_proj_d0d1_d1:map[0-9]+]] = (d0, d1) -> (d1)
|
|
|
|
// CHECK-DAG: #[[map_proj_d0d1_d1d0:map[0-9]+]] = (d0, d1) -> (d1, d0)
|
2018-07-25 01:13:31 +08:00
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @func_with_ops(%arg0: f32) {
|
|
|
|
func @func_with_ops(f32) {
|
2018-12-30 03:32:37 +08:00
|
|
|
^bb0(%a : f32):
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %0 = "getTensor"() : () -> tensor<4x4x?xf32>
|
2018-07-25 01:13:31 +08:00
|
|
|
%t = "getTensor"() : () -> tensor<4x4x?xf32>
|
|
|
|
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %1 = dim %0, 2 : tensor<4x4x?xf32>
|
2019-03-03 10:03:03 +08:00
|
|
|
%t2 = "std.dim"(%t){index: 2} : (tensor<4x4x?xf32>) -> index
|
2018-07-25 01:13:31 +08:00
|
|
|
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %2 = addf %arg0, %arg0 : f32
|
2019-03-03 10:03:03 +08:00
|
|
|
%x = "std.addf"(%a, %a) : (f32,f32) -> (f32)
|
2018-07-25 01:13:31 +08:00
|
|
|
|
|
|
|
// CHECK: return
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @standard_instrs(%arg0: tensor<4x4x?xf32>, %arg1: f32, %arg2: i32, %arg3: index) {
|
|
|
|
func @standard_instrs(tensor<4x4x?xf32>, f32, i32, index) {
|
2018-12-30 03:32:37 +08:00
|
|
|
^bb42(%t: tensor<4x4x?xf32>, %f: f32, %i: i32, %idx : index):
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %0 = dim %arg0, 2 : tensor<4x4x?xf32>
|
2019-03-03 10:03:03 +08:00
|
|
|
%a = "std.dim"(%t){index: 2} : (tensor<4x4x?xf32>) -> index
|
2018-07-25 01:13:31 +08:00
|
|
|
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %1 = dim %arg0, 2 : tensor<4x4x?xf32>
|
2018-07-26 02:15:20 +08:00
|
|
|
%a2 = dim %t, 2 : tensor<4x4x?xf32>
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %2 = addf %arg1, %arg1 : f32
|
2019-03-03 10:03:03 +08:00
|
|
|
%f2 = "std.addf"(%f, %f) : (f32,f32) -> f32
|
2018-07-25 01:41:30 +08:00
|
|
|
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %3 = addf %2, %2 : f32
|
2018-07-26 02:15:20 +08:00
|
|
|
%f3 = addf %f2, %f2 : f32
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
|
2018-10-04 00:43:13 +08:00
|
|
|
// CHECK: %4 = addi %arg2, %arg2 : i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%i2 = "std.addi"(%i, %i) : (i32,i32) -> i32
|
2018-10-04 00:43:13 +08:00
|
|
|
|
|
|
|
// CHECK: %5 = addi %4, %4 : i32
|
|
|
|
%i3 = addi %i2, %i2 : i32
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = addi %arg3, %arg3 : index
|
|
|
|
%idx1 = addi %idx, %idx : index
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = addi %arg3, %{{[0-9]+}} : index
|
2019-03-03 10:03:03 +08:00
|
|
|
%idx2 = "std.addi"(%idx, %idx1) : (index, index) -> index
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
|
|
|
|
// CHECK: %8 = subf %arg1, %arg1 : f32
|
2019-03-03 10:03:03 +08:00
|
|
|
%f4 = "std.subf"(%f, %f) : (f32,f32) -> f32
|
2018-10-04 00:43:13 +08:00
|
|
|
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
// CHECK: %9 = subf %8, %8 : f32
|
2018-10-04 00:43:13 +08:00
|
|
|
%f5 = subf %f4, %f4 : f32
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
|
|
|
|
// CHECK: %10 = subi %arg2, %arg2 : i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%i4 = "std.subi"(%i, %i) : (i32,i32) -> i32
|
2018-10-04 00:43:13 +08:00
|
|
|
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
// CHECK: %11 = subi %10, %10 : i32
|
2018-10-04 00:43:13 +08:00
|
|
|
%i5 = subi %i4, %i4 : i32
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
|
|
|
|
// CHECK: %12 = mulf %2, %2 : f32
|
2018-10-04 00:43:13 +08:00
|
|
|
%f6 = mulf %f2, %f2 : f32
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
|
|
|
|
// CHECK: %13 = muli %4, %4 : i32
|
2018-10-04 00:43:13 +08:00
|
|
|
%i6 = muli %i2, %i2 : i32
|
2018-07-25 01:13:31 +08:00
|
|
|
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %c42_i32 = constant 42 : i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%x = "std.constant"(){value: 42 : i32} : () -> i32
|
2018-08-02 01:43:18 +08:00
|
|
|
|
|
|
|
// CHECK: %c42_i32_0 = constant 42 : i32
|
|
|
|
%7 = constant 42 : i32
|
2018-08-03 07:54:36 +08:00
|
|
|
|
2019-03-03 10:03:03 +08:00
|
|
|
// CHECK: %c43 = constant {crazy: "std.foo"} 43 : index
|
|
|
|
%8 = constant {crazy: "std.foo"} 43: index
|
2018-08-17 07:56:40 +08:00
|
|
|
|
2018-08-20 12:17:22 +08:00
|
|
|
// CHECK: %cst = constant 4.300000e+01 : bf16
|
2018-08-17 07:56:40 +08:00
|
|
|
%9 = constant 43.0 : bf16
|
2018-08-20 12:17:22 +08:00
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK: %f = constant @func_with_ops : (f32) -> ()
|
|
|
|
%10 = constant @func_with_ops : (f32) -> ()
|
2018-08-20 12:17:22 +08:00
|
|
|
|
|
|
|
// CHECK: %f_1 = constant @affine_apply : () -> ()
|
|
|
|
%11 = constant @affine_apply : () -> ()
|
|
|
|
|
2018-08-22 08:55:22 +08:00
|
|
|
// CHECK: %f_2 = constant @affine_apply : () -> ()
|
|
|
|
%12 = constant @affine_apply : () -> ()
|
|
|
|
|
2018-10-30 01:22:49 +08:00
|
|
|
// CHECK: %cst_3 = constant splat<vector<4xi32>, 0> : vector<4xi32>
|
|
|
|
%13 = constant splat<vector<4 x i32>, 0> : vector<4 x i32>
|
|
|
|
|
2018-12-12 05:49:43 +08:00
|
|
|
// CHECK: %cst_4 = constant splat<tensor<42xi32>, 0> : tensor<42xi32>
|
|
|
|
%tci32 = constant splat<tensor<42 x i32>, 0> : tensor<42 x i32>
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
|
2018-12-12 05:49:43 +08:00
|
|
|
// CHECK: %cst_5 = constant splat<vector<42xi32>, 0> : vector<42xi32>
|
|
|
|
%vci32 = constant splat<vector<42 x i32>, 0> : vector<42 x i32>
|
2018-11-08 22:48:09 +08:00
|
|
|
|
2018-11-08 20:02:00 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = cmpi "eq", %{{[0-9]+}}, %{{[0-9]+}} : i32
|
|
|
|
%14 = cmpi "eq", %i3, %i4 : i32
|
|
|
|
|
|
|
|
// Predicate 1 means inequality comparison.
|
|
|
|
// CHECK: %{{[0-9]+}} = cmpi "ne", %{{[0-9]+}}, %{{[0-9]+}} : i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%15 = "std.cmpi"(%i3, %i4) {predicate: 1} : (i32, i32) -> i1
|
2018-11-08 20:02:00 +08:00
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = cmpi "slt", %cst_3, %cst_3 : vector<4xi32>
|
|
|
|
%16 = cmpi "slt", %13, %13 : vector<4 x i32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = cmpi "ne", %cst_3, %cst_3 : vector<4xi32>
|
2019-03-03 10:03:03 +08:00
|
|
|
%17 = "std.cmpi"(%13, %13) {predicate: 1} : (vector<4 x i32>, vector<4 x i32>) -> vector<4 x i1>
|
2018-11-08 20:02:00 +08:00
|
|
|
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = cmpi "slt", %arg3, %arg3 : index
|
|
|
|
%18 = cmpi "slt", %idx, %idx : index
|
|
|
|
|
2018-12-12 05:49:43 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = cmpi "eq", %cst_4, %cst_4 : tensor<42xi32>
|
|
|
|
%19 = cmpi "eq", %tci32, %tci32 : tensor<42 x i32>
|
Enable arithmetics for index types.
Arithmetic and comparison instructions are necessary to implement, e.g.,
control flow when lowering MLFunctions to CFGFunctions. (While it is possible
to replace some of the arithmetics by affine_apply instructions for loop
bounds, it is still necessary for loop bounds checking, steps, if-conditions,
non-trivial memref subscripts, etc.) Furthermore, working with indirect
accesses in, e.g., lookup tables for large embeddings, may require operating on
tensors of indexes. For example, the equivalents to C code "LUT[Index[i]]" or
"ResultIndex[i] = i + j" where i, j are loop induction variables require the
arithmetics on indices as well as the possibility to operate on tensors
thereof. Allow arithmetic and comparison operations to apply to index types by
declaring them integer-like. Allow tensors whose element type is index for
indirection purposes.
The absence of vectors with "index" element type is explicitly tested, but the
only justification for this restriction in the CL introducing the test is
"because we don't need them". Do NOT enable vectors of index types, although
it makes vector and tensor types inconsistent with respect to allowed element
types.
PiperOrigin-RevId: 220614055
2018-11-08 20:04:32 +08:00
|
|
|
|
2018-12-12 05:49:43 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = cmpi "eq", %cst_5, %cst_5 : vector<42xi32>
|
|
|
|
%20 = cmpi "eq", %vci32, %vci32 : vector<42 x i32>
|
2018-11-08 22:48:09 +08:00
|
|
|
|
2018-11-28 23:08:55 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = select %{{[0-9]+}}, %arg3, %arg3 : index
|
|
|
|
%21 = select %18, %idx, %idx : index
|
|
|
|
|
2018-12-12 05:49:43 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = select %{{[0-9]+}}, %cst_4, %cst_4 : tensor<42xi32>
|
|
|
|
%22 = select %19, %tci32, %tci32 : tensor<42 x i32>
|
2018-11-28 23:08:55 +08:00
|
|
|
|
2018-12-12 05:49:43 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = select %{{[0-9]+}}, %cst_5, %cst_5 : vector<42xi32>
|
|
|
|
%23 = select %20, %vci32, %vci32 : vector<42 x i32>
|
2018-11-28 23:08:55 +08:00
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = select %{{[0-9]+}}, %arg3, %arg3 : index
|
2019-03-03 10:03:03 +08:00
|
|
|
%24 = "std.select"(%18, %idx, %idx) : (i1, index, index) -> index
|
2018-11-28 23:08:55 +08:00
|
|
|
|
2018-12-12 05:49:43 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = select %{{[0-9]+}}, %cst_4, %cst_4 : tensor<42xi32>
|
2019-03-03 10:03:03 +08:00
|
|
|
%25 = "std.select"(%19, %tci32, %tci32) : (tensor<42 x i1>, tensor<42 x i32>, tensor<42 x i32>) -> tensor<42 x i32>
|
2018-11-28 23:08:55 +08:00
|
|
|
|
2019-01-07 06:08:42 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = divis %arg2, %arg2 : i32
|
|
|
|
%26 = divis %i, %i : i32
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = divis %arg3, %arg3 : index
|
|
|
|
%27 = divis %idx, %idx : index
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = divis %cst_5, %cst_5 : vector<42xi32>
|
|
|
|
%28 = divis %vci32, %vci32 : vector<42 x i32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = divis %cst_4, %cst_4 : tensor<42xi32>
|
|
|
|
%29 = divis %tci32, %tci32 : tensor<42 x i32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = divis %arg2, %arg2 : i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%30 = "std.divis"(%i, %i) : (i32, i32) -> i32
|
2019-01-07 06:08:42 +08:00
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = diviu %arg2, %arg2 : i32
|
|
|
|
%31 = diviu %i, %i : i32
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = diviu %arg3, %arg3 : index
|
|
|
|
%32 = diviu %idx, %idx : index
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = diviu %cst_5, %cst_5 : vector<42xi32>
|
|
|
|
%33 = diviu %vci32, %vci32 : vector<42 x i32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = diviu %cst_4, %cst_4 : tensor<42xi32>
|
|
|
|
%34 = diviu %tci32, %tci32 : tensor<42 x i32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = diviu %arg2, %arg2 : i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%35 = "std.diviu"(%i, %i) : (i32, i32) -> i32
|
2019-01-07 06:08:42 +08:00
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remis %arg2, %arg2 : i32
|
|
|
|
%36 = remis %i, %i : i32
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remis %arg3, %arg3 : index
|
|
|
|
%37 = remis %idx, %idx : index
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remis %cst_5, %cst_5 : vector<42xi32>
|
|
|
|
%38 = remis %vci32, %vci32 : vector<42 x i32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remis %cst_4, %cst_4 : tensor<42xi32>
|
|
|
|
%39 = remis %tci32, %tci32 : tensor<42 x i32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remis %arg2, %arg2 : i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%40 = "std.remis"(%i, %i) : (i32, i32) -> i32
|
2019-01-07 06:08:42 +08:00
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remiu %arg2, %arg2 : i32
|
|
|
|
%41 = remiu %i, %i : i32
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remiu %arg3, %arg3 : index
|
|
|
|
%42 = remiu %idx, %idx : index
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remiu %cst_5, %cst_5 : vector<42xi32>
|
|
|
|
%43 = remiu %vci32, %vci32 : vector<42 x i32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remiu %cst_4, %cst_4 : tensor<42xi32>
|
|
|
|
%44 = remiu %tci32, %tci32 : tensor<42 x i32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remiu %arg2, %arg2 : i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%45 = "std.remiu"(%i, %i) : (i32, i32) -> i32
|
2019-01-07 06:08:42 +08:00
|
|
|
|
2019-02-21 22:30:53 +08:00
|
|
|
// CHECK: %{{[0-9]+}} = divf %arg1, %arg1 : f32
|
2019-03-03 10:03:03 +08:00
|
|
|
%46 = "std.divf"(%f, %f) : (f32,f32) -> f32
|
2019-02-21 22:30:53 +08:00
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = divf %arg1, %arg1 : f32
|
|
|
|
%47 = divf %f, %f : f32
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = divf %arg0, %arg0 : tensor<4x4x?xf32>
|
|
|
|
%48 = divf %t, %t : tensor<4x4x?xf32>
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remf %arg1, %arg1 : f32
|
2019-03-03 10:03:03 +08:00
|
|
|
%49 = "std.remf"(%f, %f) : (f32,f32) -> f32
|
2019-02-21 22:30:53 +08:00
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remf %arg1, %arg1 : f32
|
|
|
|
%50 = remf %f, %f : f32
|
|
|
|
|
|
|
|
// CHECK: %{{[0-9]+}} = remf %arg0, %arg0 : tensor<4x4x?xf32>
|
|
|
|
%51 = remf %t, %t : tensor<4x4x?xf32>
|
|
|
|
|
2018-07-25 01:13:31 +08:00
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @affine_apply() {
|
|
|
|
func @affine_apply() {
|
2019-03-03 10:03:03 +08:00
|
|
|
%i = "std.constant"() {value: 0: index} : () -> index
|
|
|
|
%j = "std.constant"() {value: 1: index} : () -> index
|
2018-07-25 01:13:31 +08:00
|
|
|
|
2019-02-07 03:08:18 +08:00
|
|
|
// CHECK: affine.apply #map0(%c0)
|
|
|
|
%a = "affine.apply" (%i) { map: (d0) -> (d0 + 1) } :
|
2018-10-07 08:21:53 +08:00
|
|
|
(index) -> (index)
|
2018-07-25 01:13:31 +08:00
|
|
|
|
2019-02-07 03:08:18 +08:00
|
|
|
// CHECK: affine.apply #map1()[%c0]
|
|
|
|
%b = affine.apply ()[x] -> (x+1)()[%i]
|
2018-07-29 00:36:25 +08:00
|
|
|
|
2018-07-25 01:13:31 +08:00
|
|
|
return
|
2018-07-26 02:15:20 +08:00
|
|
|
}
|
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @load_store
|
|
|
|
func @load_store(memref<4x4xi32>, index) {
|
2018-12-30 03:32:37 +08:00
|
|
|
^bb0(%0: memref<4x4xi32>, %1: index):
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %0 = load %arg0[%arg1, %arg1] : memref<4x4xi32>
|
2019-03-03 10:03:03 +08:00
|
|
|
%2 = "std.load"(%0, %1, %1) : (memref<4x4xi32>, index, index)->i32
|
2018-07-26 02:15:20 +08:00
|
|
|
|
2018-08-02 01:43:18 +08:00
|
|
|
// CHECK: %1 = load %arg0[%arg1, %arg1] : memref<4x4xi32>
|
2018-07-26 03:55:50 +08:00
|
|
|
%3 = load %0[%1, %1] : memref<4x4xi32>
|
2018-07-26 02:15:20 +08:00
|
|
|
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @return_op(%arg0: i32) -> i32 {
|
|
|
|
func @return_op(%a : i32) -> i32 {
|
2018-08-10 03:28:58 +08:00
|
|
|
// CHECK: return %arg0 : i32
|
2019-03-03 10:03:03 +08:00
|
|
|
"std.return" (%a) : (i32)->()
|
2018-08-10 03:28:58 +08:00
|
|
|
}
|
2018-08-22 08:55:22 +08:00
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @calls(%arg0: i32) {
|
|
|
|
func @calls(%arg0: i32) {
|
2018-08-22 08:55:22 +08:00
|
|
|
// CHECK: %0 = call @return_op(%arg0) : (i32) -> i32
|
|
|
|
%x = call @return_op(%arg0) : (i32) -> i32
|
|
|
|
// CHECK: %1 = call @return_op(%0) : (i32) -> i32
|
|
|
|
%y = call @return_op(%x) : (i32) -> i32
|
|
|
|
// CHECK: %2 = call @return_op(%0) : (i32) -> i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%z = "std.call"(%x) {callee: @return_op : (i32) -> i32} : (i32) -> i32
|
2018-08-22 08:55:22 +08:00
|
|
|
|
|
|
|
// CHECK: %f = constant @affine_apply : () -> ()
|
|
|
|
%f = constant @affine_apply : () -> ()
|
|
|
|
|
|
|
|
// CHECK: call_indirect %f() : () -> ()
|
|
|
|
call_indirect %f() : () -> ()
|
|
|
|
|
|
|
|
// CHECK: %f_0 = constant @return_op : (i32) -> i32
|
|
|
|
%f_0 = constant @return_op : (i32) -> i32
|
|
|
|
|
|
|
|
// CHECK: %3 = call_indirect %f_0(%arg0) : (i32) -> i32
|
|
|
|
%2 = call_indirect %f_0(%arg0) : (i32) -> i32
|
|
|
|
|
|
|
|
// CHECK: %4 = call_indirect %f_0(%arg0) : (i32) -> i32
|
2019-03-03 10:03:03 +08:00
|
|
|
%3 = "std.call_indirect"(%f_0, %arg0) : ((i32) -> i32, i32) -> i32
|
2018-08-22 08:55:22 +08:00
|
|
|
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @extract_element(%arg0: tensor<*xi32>, %arg1: tensor<4x4xf32>) -> i32 {
|
|
|
|
func @extract_element(%arg0: tensor<*xi32>, %arg1 : tensor<4x4xf32>) -> i32 {
|
2019-03-03 10:03:03 +08:00
|
|
|
%c0 = "std.constant"() {value: 0: index} : () -> index
|
2018-08-24 00:58:23 +08:00
|
|
|
|
2018-09-14 01:43:35 +08:00
|
|
|
// CHECK: %0 = extract_element %arg0[%c0, %c0, %c0, %c0] : tensor<*xi32>
|
|
|
|
%0 = extract_element %arg0[%c0, %c0, %c0, %c0] : tensor<*xi32>
|
2018-08-24 00:58:23 +08:00
|
|
|
|
|
|
|
// CHECK: %1 = extract_element %arg1[%c0, %c0] : tensor<4x4xf32>
|
|
|
|
%1 = extract_element %arg1[%c0, %c0] : tensor<4x4xf32>
|
|
|
|
|
|
|
|
return %0 : i32
|
|
|
|
}
|
2018-08-22 08:55:22 +08:00
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @tensor_cast(%arg0
|
|
|
|
func @tensor_cast(%arg0: tensor<*xf32>, %arg1 : tensor<4x4xf32>, %arg2: tensor<?x?xf32>) {
|
2018-10-25 00:52:06 +08:00
|
|
|
// CHECK: %0 = tensor_cast %arg0 : tensor<*xf32> to tensor<?x?xf32>
|
|
|
|
%0 = tensor_cast %arg0 : tensor<*xf32> to tensor<?x?xf32>
|
2018-09-14 00:16:32 +08:00
|
|
|
|
2018-10-25 00:52:06 +08:00
|
|
|
// CHECK: %1 = tensor_cast %arg1 : tensor<4x4xf32> to tensor<*xf32>
|
|
|
|
%1 = tensor_cast %arg1 : tensor<4x4xf32> to tensor<*xf32>
|
2018-09-14 00:16:32 +08:00
|
|
|
|
2018-10-25 00:52:06 +08:00
|
|
|
// CHECK: %2 = tensor_cast %arg2 : tensor<?x?xf32> to tensor<4x?xf32>
|
|
|
|
%2 = tensor_cast %arg2 : tensor<?x?xf32> to tensor<4x?xf32>
|
2018-09-14 00:16:32 +08:00
|
|
|
|
2018-10-25 00:52:06 +08:00
|
|
|
// CHECK: %3 = tensor_cast %2 : tensor<4x?xf32> to tensor<?x?xf32>
|
|
|
|
%3 = tensor_cast %2 : tensor<4x?xf32> to tensor<?x?xf32>
|
2018-09-14 00:16:32 +08:00
|
|
|
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @memref_cast(%arg0
|
|
|
|
func @memref_cast(%arg0: memref<4xf32>, %arg1 : memref<?xf32>) {
|
2018-10-23 00:00:03 +08:00
|
|
|
// CHECK: %0 = memref_cast %arg0 : memref<4xf32> to memref<?xf32>
|
|
|
|
%0 = memref_cast %arg0 : memref<4xf32> to memref<?xf32>
|
|
|
|
|
|
|
|
// CHECK: %1 = memref_cast %arg1 : memref<?xf32> to memref<4xf32>
|
|
|
|
%1 = memref_cast %arg1 : memref<?xf32> to memref<4xf32>
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2019-01-03 02:20:00 +08:00
|
|
|
// CHECK-LABEL: func @test_dimop(%arg0
|
|
|
|
func @test_dimop(%arg0: tensor<4x4x?xf32>) {
|
2018-09-27 07:21:49 +08:00
|
|
|
// CHECK: %0 = dim %arg0, 2 : tensor<4x4x?xf32>
|
|
|
|
%0 = dim %arg0, 2 : tensor<4x4x?xf32>
|
|
|
|
// use dim as an affine_int to ensure type correctness
|
2019-02-07 03:08:18 +08:00
|
|
|
%1 = affine.apply (d0) -> (d0)(%0)
|
2018-09-27 07:21:49 +08:00
|
|
|
return
|
|
|
|
}
|
|
|
|
|
[MLIR] Add VectorTransferOps
This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.
VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.
VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.
Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.
VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.
A vector transfer read has semantics similar to a vector load, with additional
support for:
1. an optional value of the elemental type of the MemRef. This value
supports non-effecting padding and is inserted in places where the
vector read exceeds the MemRef bounds. If the value is not specified,
the access is statically guaranteed to be within bounds;
2. an attribute of type AffineMap to specify a slice of the original
MemRef access and its transposition into the super-vector shape. The
permutation_map is an unbounded AffineMap that must represent a
permutation from the MemRef dim space projected onto the vector dim
space.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
...
%val = `ssa-value` : f32
// let %i, %j, %k, %l be ssa-values of type index
%v0 = vector_transfer_read %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index) ->
vector<16x32x64xf32>
%v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index, f32) ->
vector<16x32x64xf32>
```
VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
%val = `ssa-value` : vector<16x32x64xf32>
// let %i, %j, %k, %l be ssa-values of type index
vector_transfer_write %val, %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234
2018-12-04 07:21:27 +08:00
|
|
|
|
Cleanup SuperVectorization dialect printing and parsing.
On the read side,
```
%3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: (d0, d1, d2)->(d2, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
```
becomes:
```
%3 = vector_transfer_read %arg0[%i2, %i1, %i0] {permutation_map: (d0, d1, d2)->(d2, d0)} : memref<?x?x?xf32>, vector<32x256xf32>
```
On the write side,
```
vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
```
becomes
```
vector_transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
```
Documentation will be cleaned up in a followup commit that also extracts a proper .md from the top of the file comments.
PiperOrigin-RevId: 241021879
2019-03-30 02:48:20 +08:00
|
|
|
// CHECK-LABEL: func @test_vector.transfer_ops(%arg0
|
|
|
|
func @test_vector.transfer_ops(%arg0: memref<?x?xf32>) {
|
[MLIR] Add VectorTransferOps
This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.
VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.
VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.
Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.
VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.
A vector transfer read has semantics similar to a vector load, with additional
support for:
1. an optional value of the elemental type of the MemRef. This value
supports non-effecting padding and is inserted in places where the
vector read exceeds the MemRef bounds. If the value is not specified,
the access is statically guaranteed to be within bounds;
2. an attribute of type AffineMap to specify a slice of the original
MemRef access and its transposition into the super-vector shape. The
permutation_map is an unbounded AffineMap that must represent a
permutation from the MemRef dim space projected onto the vector dim
space.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
...
%val = `ssa-value` : f32
// let %i, %j, %k, %l be ssa-values of type index
%v0 = vector_transfer_read %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index) ->
vector<16x32x64xf32>
%v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index, f32) ->
vector<16x32x64xf32>
```
VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
%val = `ssa-value` : vector<16x32x64xf32>
// let %i, %j, %k, %l be ssa-values of type index
vector_transfer_write %val, %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234
2018-12-04 07:21:27 +08:00
|
|
|
%c3 = constant 3 : index
|
|
|
|
%cst = constant 3.0 : f32
|
Cleanup SuperVectorization dialect printing and parsing.
On the read side,
```
%3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: (d0, d1, d2)->(d2, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
```
becomes:
```
%3 = vector_transfer_read %arg0[%i2, %i1, %i0] {permutation_map: (d0, d1, d2)->(d2, d0)} : memref<?x?x?xf32>, vector<32x256xf32>
```
On the write side,
```
vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
```
becomes
```
vector_transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
```
Documentation will be cleaned up in a followup commit that also extracts a proper .md from the top of the file comments.
PiperOrigin-RevId: 241021879
2019-03-30 02:48:20 +08:00
|
|
|
// CHECK: %0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d0]]} : memref<?x?xf32>, vector<128xf32>
|
|
|
|
%0 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : memref<?x?xf32>, vector<128xf32>
|
|
|
|
// CHECK: %1 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d1d0]]} : memref<?x?xf32>, vector<3x7xf32>
|
|
|
|
%1 = vector.transfer_read %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d1, d0)} : memref<?x?xf32>, vector<3x7xf32>
|
|
|
|
// CHECK: %2 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: #[[map_proj_d0d1_d0]]} : memref<?x?xf32>, vector<128xf32>
|
|
|
|
%2 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: (d0, d1)->(d0)} : memref<?x?xf32>, vector<128xf32>
|
|
|
|
// CHECK: %3 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: #[[map_proj_d0d1_d1]]} : memref<?x?xf32>, vector<128xf32>
|
|
|
|
%3 = vector.transfer_read %arg0[%c3, %c3], (%cst) {permutation_map: (d0, d1)->(d1)} : memref<?x?xf32>, vector<128xf32>
|
[MLIR] Add VectorTransferOps
This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.
VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.
VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.
Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.
VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.
A vector transfer read has semantics similar to a vector load, with additional
support for:
1. an optional value of the elemental type of the MemRef. This value
supports non-effecting padding and is inserted in places where the
vector read exceeds the MemRef bounds. If the value is not specified,
the access is statically guaranteed to be within bounds;
2. an attribute of type AffineMap to specify a slice of the original
MemRef access and its transposition into the super-vector shape. The
permutation_map is an unbounded AffineMap that must represent a
permutation from the MemRef dim space projected onto the vector dim
space.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
...
%val = `ssa-value` : f32
// let %i, %j, %k, %l be ssa-values of type index
%v0 = vector_transfer_read %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index) ->
vector<16x32x64xf32>
%v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index, f32) ->
vector<16x32x64xf32>
```
VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
%val = `ssa-value` : vector<16x32x64xf32>
// let %i, %j, %k, %l be ssa-values of type index
vector_transfer_write %val, %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234
2018-12-04 07:21:27 +08:00
|
|
|
//
|
Cleanup SuperVectorization dialect printing and parsing.
On the read side,
```
%3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: (d0, d1, d2)->(d2, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
```
becomes:
```
%3 = vector_transfer_read %arg0[%i2, %i1, %i0] {permutation_map: (d0, d1, d2)->(d2, d0)} : memref<?x?x?xf32>, vector<32x256xf32>
```
On the write side,
```
vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
```
becomes
```
vector_transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
```
Documentation will be cleaned up in a followup commit that also extracts a proper .md from the top of the file comments.
PiperOrigin-RevId: 241021879
2019-03-30 02:48:20 +08:00
|
|
|
// CHECK: vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d0]]} : vector<128xf32>, memref<?x?xf32>
|
|
|
|
vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
|
|
|
|
// CHECK: vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map: #[[map_proj_d0d1_d1d0]]} : vector<3x7xf32>, memref<?x?xf32>
|
|
|
|
vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d1, d0)} : vector<3x7xf32>, memref<?x?xf32>
|
[MLIR] Add VectorTransferOps
This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.
VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.
VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.
Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.
VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.
A vector transfer read has semantics similar to a vector load, with additional
support for:
1. an optional value of the elemental type of the MemRef. This value
supports non-effecting padding and is inserted in places where the
vector read exceeds the MemRef bounds. If the value is not specified,
the access is statically guaranteed to be within bounds;
2. an attribute of type AffineMap to specify a slice of the original
MemRef access and its transposition into the super-vector shape. The
permutation_map is an unbounded AffineMap that must represent a
permutation from the MemRef dim space projected onto the vector dim
space.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
...
%val = `ssa-value` : f32
// let %i, %j, %k, %l be ssa-values of type index
%v0 = vector_transfer_read %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index) ->
vector<16x32x64xf32>
%v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(memref<?x?x?x?xf32>, index, index, index, index, f32) ->
vector<16x32x64xf32>
```
VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.
Example:
```mlir
%A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
%val = `ssa-value` : vector<16x32x64xf32>
// let %i, %j, %k, %l be ssa-values of type index
vector_transfer_write %val, %src, %i, %j, %k, %l
{permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
(vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234
2018-12-04 07:21:27 +08:00
|
|
|
return
|
|
|
|
}
|