2018-11-28 08:15:29 +08:00
|
|
|
# MLIR Passes
|
|
|
|
|
|
|
|
This document describes the available MLIR passes and their contracts.
|
|
|
|
|
|
|
|
[TOC]
|
|
|
|
|
2019-04-05 23:19:42 +08:00
|
|
|
## Affine control lowering (`-lower-affine`)
|
2018-11-28 08:15:29 +08:00
|
|
|
|
2019-03-30 04:15:06 +08:00
|
|
|
Convert operations related to affine control into a graph of blocks using
|
2019-01-03 04:52:41 +08:00
|
|
|
operations from the standard dialect.
|
2018-11-28 08:15:29 +08:00
|
|
|
|
2019-01-03 04:52:41 +08:00
|
|
|
Loop statements are converted to a subgraph of blocks (initialization, condition
|
|
|
|
checking, subgraph of body blocks) with loop induction variable being passed as
|
|
|
|
the block argument of the condition checking block. Conditional statements are
|
|
|
|
converted to a subgraph of blocks (chain of condition checking with
|
2019-02-07 03:08:18 +08:00
|
|
|
short-circuit logic, subgraphs of 'then' and 'else' body blocks). `affine.apply`
|
2019-01-03 04:52:41 +08:00
|
|
|
operations are converted into sequences of primitive arithmetic operations that
|
|
|
|
have the same effect, using operands of the `index` type. Consequently, named
|
|
|
|
maps and sets may be removed from the module.
|
2018-11-28 08:15:29 +08:00
|
|
|
|
2019-02-07 03:08:18 +08:00
|
|
|
For example, `%r = affine.apply (d0, d1)[s0] -> (d0 + 2*d1 + s0)(%d0, %d1)[%s0]`
|
2019-01-03 04:32:30 +08:00
|
|
|
can be converted into:
|
2018-11-28 08:15:29 +08:00
|
|
|
|
|
|
|
```mlir
|
|
|
|
%d0 = <...>
|
|
|
|
%d1 = <...>
|
|
|
|
%s0 = <...>
|
|
|
|
%0 = constant 2 : index
|
|
|
|
%1 = muli %0, %d1
|
|
|
|
%2 = addi %d0, %1
|
|
|
|
%r = addi %2, %s0
|
|
|
|
```
|
|
|
|
|
2019-01-03 04:32:30 +08:00
|
|
|
### Input invariant
|
2018-11-28 08:15:29 +08:00
|
|
|
|
2019-01-03 04:52:41 +08:00
|
|
|
- no `Tensor` types;
|
|
|
|
|
|
|
|
These restrictions may be lifted in the future.
|
2018-11-28 08:15:29 +08:00
|
|
|
|
|
|
|
### Output IR
|
|
|
|
|
2019-03-30 04:15:06 +08:00
|
|
|
Functions with `affine.for` and `affine.if` operations eliminated. These
|
2019-03-26 01:14:34 +08:00
|
|
|
functions may contain operations from the Standard dialect in addition to those
|
|
|
|
already present before the pass.
|
2018-11-28 08:15:29 +08:00
|
|
|
|
|
|
|
### Invariants
|
|
|
|
|
2019-01-03 04:52:41 +08:00
|
|
|
- Functions without a body are not modified.
|
|
|
|
- The semantics of the other functions is preserved.
|
|
|
|
- Individual operations other than those mentioned above are not modified if
|
|
|
|
they do not depend on the loop iterator value or on the result of
|
2019-02-07 03:08:18 +08:00
|
|
|
`affine.apply`.
|
Generic dialect conversion pass exercised by LLVM IR lowering
This commit introduces a generic dialect conversion/lowering/legalization pass
and illustrates it on StandardOps->LLVMIR conversion.
It partially reuses the PatternRewriter infrastructure and adds the following
functionality:
- an actual pass;
- non-default pattern constructors;
- one-to-many rewrites;
- rewriting terminators with successors;
- not applying patterns iteratively (unlike the existing greedy rewrite driver);
- ability to change function signature;
- ability to change basic block argument types.
The latter two things required, given the existing API, to create new functions
in the same module. Eventually, this should converge with the rest of
PatternRewriter. However, we may want to keep two pass versions: "heavy" with
function/block argument conversion and "light" that only touches operations.
This pass creates new functions within a module as a means to change function
signature, then creates new blocks with converted argument types in the new
function. Then, it traverses the CFG in DFS-preorder to make sure defs are
converted before uses in the dominated blocks. The generic pass has a minimal
interface with two hooks: one to fill in the set of patterns, and another one
to convert types for functions and blocks. The patterns are defined as
separate classes that can be table-generated in the future.
The LLVM IR lowering pass partially inherits from the existing LLVM IR
translator, in particular for type conversion. It defines a conversion pattern
template, instantiated for different operations, and is a good candidate for
tablegen. The lowering does not yet support loads and stores and is not
connected to the translator as it would have broken the existing flows. Future
patches will add missing support before switching the translator in a single
patch.
PiperOrigin-RevId: 230951202
2019-01-26 04:46:53 +08:00
|
|
|
|
2019-04-29 01:54:06 +08:00
|
|
|
## Conversion from Standard to LLVM IR dialect (`-lower-to-llvm`)
|
Generic dialect conversion pass exercised by LLVM IR lowering
This commit introduces a generic dialect conversion/lowering/legalization pass
and illustrates it on StandardOps->LLVMIR conversion.
It partially reuses the PatternRewriter infrastructure and adds the following
functionality:
- an actual pass;
- non-default pattern constructors;
- one-to-many rewrites;
- rewriting terminators with successors;
- not applying patterns iteratively (unlike the existing greedy rewrite driver);
- ability to change function signature;
- ability to change basic block argument types.
The latter two things required, given the existing API, to create new functions
in the same module. Eventually, this should converge with the rest of
PatternRewriter. However, we may want to keep two pass versions: "heavy" with
function/block argument conversion and "light" that only touches operations.
This pass creates new functions within a module as a means to change function
signature, then creates new blocks with converted argument types in the new
function. Then, it traverses the CFG in DFS-preorder to make sure defs are
converted before uses in the dominated blocks. The generic pass has a minimal
interface with two hooks: one to fill in the set of patterns, and another one
to convert types for functions and blocks. The patterns are defined as
separate classes that can be table-generated in the future.
The LLVM IR lowering pass partially inherits from the existing LLVM IR
translator, in particular for type conversion. It defines a conversion pattern
template, instantiated for different operations, and is a good candidate for
tablegen. The lowering does not yet support loads and stores and is not
connected to the translator as it would have broken the existing flows. Future
patches will add missing support before switching the translator in a single
patch.
PiperOrigin-RevId: 230951202
2019-01-26 04:46:53 +08:00
|
|
|
|
2019-03-28 00:16:27 +08:00
|
|
|
Convert standard operations into the LLVM IR dialect operations.
|
Generic dialect conversion pass exercised by LLVM IR lowering
This commit introduces a generic dialect conversion/lowering/legalization pass
and illustrates it on StandardOps->LLVMIR conversion.
It partially reuses the PatternRewriter infrastructure and adds the following
functionality:
- an actual pass;
- non-default pattern constructors;
- one-to-many rewrites;
- rewriting terminators with successors;
- not applying patterns iteratively (unlike the existing greedy rewrite driver);
- ability to change function signature;
- ability to change basic block argument types.
The latter two things required, given the existing API, to create new functions
in the same module. Eventually, this should converge with the rest of
PatternRewriter. However, we may want to keep two pass versions: "heavy" with
function/block argument conversion and "light" that only touches operations.
This pass creates new functions within a module as a means to change function
signature, then creates new blocks with converted argument types in the new
function. Then, it traverses the CFG in DFS-preorder to make sure defs are
converted before uses in the dominated blocks. The generic pass has a minimal
interface with two hooks: one to fill in the set of patterns, and another one
to convert types for functions and blocks. The patterns are defined as
separate classes that can be table-generated in the future.
The LLVM IR lowering pass partially inherits from the existing LLVM IR
translator, in particular for type conversion. It defines a conversion pattern
template, instantiated for different operations, and is a good candidate for
tablegen. The lowering does not yet support loads and stores and is not
connected to the translator as it would have broken the existing flows. Future
patches will add missing support before switching the translator in a single
patch.
PiperOrigin-RevId: 230951202
2019-01-26 04:46:53 +08:00
|
|
|
|
|
|
|
### Input invariant
|
|
|
|
|
2019-02-20 02:26:53 +08:00
|
|
|
- operations including: arithmetic on integers and floats, constants, direct
|
Generic dialect conversion pass exercised by LLVM IR lowering
This commit introduces a generic dialect conversion/lowering/legalization pass
and illustrates it on StandardOps->LLVMIR conversion.
It partially reuses the PatternRewriter infrastructure and adds the following
functionality:
- an actual pass;
- non-default pattern constructors;
- one-to-many rewrites;
- rewriting terminators with successors;
- not applying patterns iteratively (unlike the existing greedy rewrite driver);
- ability to change function signature;
- ability to change basic block argument types.
The latter two things required, given the existing API, to create new functions
in the same module. Eventually, this should converge with the rest of
PatternRewriter. However, we may want to keep two pass versions: "heavy" with
function/block argument conversion and "light" that only touches operations.
This pass creates new functions within a module as a means to change function
signature, then creates new blocks with converted argument types in the new
function. Then, it traverses the CFG in DFS-preorder to make sure defs are
converted before uses in the dominated blocks. The generic pass has a minimal
interface with two hooks: one to fill in the set of patterns, and another one
to convert types for functions and blocks. The patterns are defined as
separate classes that can be table-generated in the future.
The LLVM IR lowering pass partially inherits from the existing LLVM IR
translator, in particular for type conversion. It defines a conversion pattern
template, instantiated for different operations, and is a good candidate for
tablegen. The lowering does not yet support loads and stores and is not
connected to the translator as it would have broken the existing flows. Future
patches will add missing support before switching the translator in a single
patch.
PiperOrigin-RevId: 230951202
2019-01-26 04:46:53 +08:00
|
|
|
calls, returns and branches;
|
|
|
|
- no `tensor` types;
|
|
|
|
- all `vector` are one-dimensional;
|
|
|
|
- all blocks are reachable by following the successors of the first basic
|
|
|
|
block;
|
|
|
|
|
|
|
|
If other operations are present and their results are required by the LLVM IR
|
2019-03-28 00:16:27 +08:00
|
|
|
dialect operations, the pass will fail. Any LLVM IR operations or types already
|
|
|
|
present in the IR will be kept as is.
|
Generic dialect conversion pass exercised by LLVM IR lowering
This commit introduces a generic dialect conversion/lowering/legalization pass
and illustrates it on StandardOps->LLVMIR conversion.
It partially reuses the PatternRewriter infrastructure and adds the following
functionality:
- an actual pass;
- non-default pattern constructors;
- one-to-many rewrites;
- rewriting terminators with successors;
- not applying patterns iteratively (unlike the existing greedy rewrite driver);
- ability to change function signature;
- ability to change basic block argument types.
The latter two things required, given the existing API, to create new functions
in the same module. Eventually, this should converge with the rest of
PatternRewriter. However, we may want to keep two pass versions: "heavy" with
function/block argument conversion and "light" that only touches operations.
This pass creates new functions within a module as a means to change function
signature, then creates new blocks with converted argument types in the new
function. Then, it traverses the CFG in DFS-preorder to make sure defs are
converted before uses in the dominated blocks. The generic pass has a minimal
interface with two hooks: one to fill in the set of patterns, and another one
to convert types for functions and blocks. The patterns are defined as
separate classes that can be table-generated in the future.
The LLVM IR lowering pass partially inherits from the existing LLVM IR
translator, in particular for type conversion. It defines a conversion pattern
template, instantiated for different operations, and is a good candidate for
tablegen. The lowering does not yet support loads and stores and is not
connected to the translator as it would have broken the existing flows. Future
patches will add missing support before switching the translator in a single
patch.
PiperOrigin-RevId: 230951202
2019-01-26 04:46:53 +08:00
|
|
|
|
|
|
|
### Output IR
|
|
|
|
|
|
|
|
Functions converted to LLVM IR. Function arguments types are converted
|
|
|
|
one-to-one. Function results are converted one-to-one and, in case more than 1
|
|
|
|
value is returned, packed into an LLVM IR struct type. Function calls and
|
|
|
|
returns are updated accordingly. Block argument types are updated to use LLVM IR
|
|
|
|
types.
|
2019-02-20 02:26:53 +08:00
|
|
|
|
2019-08-21 01:38:27 +08:00
|
|
|
## Data Copy DMA generation (`-affine-data-copy-generate`)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
|
|
|
Replaces all loads and stores on memref's living in 'slowMemorySpace' by
|
|
|
|
introducing DMA operations (strided DMA if necessary) to transfer data to/from
|
|
|
|
`fastMemorySpace` and rewriting the original load's/store's to instead
|
|
|
|
load/store from the allocated fast memory buffers. Additional options specify
|
|
|
|
the identifier corresponding to the fast memory space and the amount of fast
|
|
|
|
memory space available. The pass traverses through the nesting structure,
|
|
|
|
recursing to inner levels if necessary to determine at what depth DMA transfers
|
|
|
|
need to be placed so that the allocated buffers fit within the memory capacity
|
|
|
|
provided. If this is not possible (for example, when the elemental type itself
|
|
|
|
is of size larger than the DMA capacity), an error with location information is
|
|
|
|
emitted. The DMA transfers are also hoisted up past all loops with respect to
|
|
|
|
which the transfers are invariant.
|
|
|
|
|
|
|
|
Input
|
|
|
|
|
|
|
|
```mlir
|
|
|
|
func @loop_nest_tiled() -> memref<256x1024xf32> {
|
|
|
|
%0 = alloc() : memref<256x1024xf32>
|
2019-03-26 01:14:34 +08:00
|
|
|
affine.for %i0 = 0 to 256 step 32 {
|
|
|
|
affine.for %i1 = 0 to 1024 step 32 {
|
|
|
|
affine.for %i2 = (d0) -> (d0)(%i0) to (d0) -> (d0 + 32)(%i0) {
|
|
|
|
affine.for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
|
2019-08-21 01:38:27 +08:00
|
|
|
%1 = affine.load %0[%i2, %i3] : memref<256x1024xf32>
|
2019-02-20 02:26:53 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return %0 : memref<256x1024xf32>
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2019-08-21 01:38:27 +08:00
|
|
|
Output (with flags: -affine-data-copy-generate -affine-data-copy-generate-fast-mem-space=2)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
|
|
|
```mlir
|
2019-08-21 01:38:27 +08:00
|
|
|
module {
|
|
|
|
func @loop_nest_tiled() -> memref<256x1024xf32> {
|
|
|
|
%c262144 = constant 262144 : index
|
|
|
|
%c0 = constant 0 : index
|
|
|
|
%0 = alloc() : memref<256x1024xf32>
|
|
|
|
%1 = alloc() : memref<256x1024xf32, 2>
|
|
|
|
%2 = alloc() : memref<1xi32>
|
|
|
|
affine.dma_start %0[%c0, %c0], %1[%c0, %c0], %2[%c0], %c262144 : memref<256x1024xf32>, memref<256x1024xf32, 2>, memref<1xi32>
|
|
|
|
affine.dma_wait %2[%c0], %c262144 : memref<1xi32>
|
|
|
|
affine.for %arg0 = 0 to 256 step 32 {
|
|
|
|
affine.for %arg1 = 0 to 1024 step 32 {
|
|
|
|
affine.for %arg2 = #map1(%arg0) to #map2(%arg0) {
|
|
|
|
affine.for %arg3 = #map1(%arg1) to #map2(%arg1) {
|
|
|
|
%3 = affine.load %1[%arg2, %arg3] : memref<256x1024xf32, 2>
|
|
|
|
}
|
2019-02-20 02:26:53 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2019-08-21 01:38:27 +08:00
|
|
|
dealloc %2 : memref<1xi32>
|
|
|
|
dealloc %1 : memref<256x1024xf32, 2>
|
|
|
|
return %0 : memref<256x1024xf32>
|
2019-02-20 02:26:53 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2019-05-04 02:07:37 +08:00
|
|
|
## Loop tiling (`-affine-loop-tile`)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
|
|
|
Performs tiling or blocking of loop nests. It currently works on perfect loop
|
|
|
|
nests.
|
|
|
|
|
2019-05-04 02:07:37 +08:00
|
|
|
## Loop unroll (`-affine-loop-unroll`)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
2019-02-21 06:12:21 +08:00
|
|
|
This pass implements loop unrolling. It is able to unroll loops with arbitrary
|
|
|
|
bounds, and generate a cleanup loop when necessary.
|
2019-02-20 02:26:53 +08:00
|
|
|
|
2019-05-04 02:07:37 +08:00
|
|
|
## Loop unroll and jam (`-affine-loop-unroll-jam`)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
|
|
|
This pass implements unroll and jam for loops. It works on both perfect or
|
|
|
|
imperfect loop nests.
|
|
|
|
|
2019-05-04 02:07:37 +08:00
|
|
|
## Loop fusion (`-affine-loop-fusion`)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
|
|
|
Performs fusion of loop nests using a slicing-based approach. The fused loop
|
|
|
|
nests, when possible, are rewritten to access significantly smaller local
|
|
|
|
buffers instead of the original memref's, and the latter are often
|
2019-02-21 06:12:21 +08:00
|
|
|
either completely optimized away or contracted. This transformation leads to
|
|
|
|
enhanced locality and lower memory footprint through the elimination or
|
|
|
|
contraction of temporaries / intermediate memref's. These benefits are sometimes
|
|
|
|
achieved at the expense of redundant computation through a cost model that
|
|
|
|
evaluates available choices such as the depth at which a source slice should be
|
2019-02-20 02:26:53 +08:00
|
|
|
materialized in the designation slice.
|
|
|
|
|
2019-04-05 23:19:42 +08:00
|
|
|
## Memref bound checking (`-memref-bound-check`)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
|
|
|
Checks all load's and store's on memref's for out of bound accesses, and reports
|
|
|
|
any out of bound accesses (both overrun and underrun) with location information.
|
|
|
|
|
|
|
|
```mlir
|
|
|
|
test/Transforms/memref-bound-check.mlir:19:13: error: 'load' op memref out of upper bound access along dimension #2
|
|
|
|
%x = load %A[%idx0, %idx1] : memref<9 x 9 x i32>
|
|
|
|
^
|
|
|
|
test/Transforms/memref-bound-check.mlir:19:13: error: 'load' op memref out of lower bound access along dimension #2
|
|
|
|
%x = load %A[%idx0, %idx1] : memref<9 x 9 x i32>
|
|
|
|
^
|
|
|
|
```
|
|
|
|
|
2019-04-05 23:19:42 +08:00
|
|
|
## Memref dataflow optimization (`-memref-dataflow-opt`)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
2019-02-21 06:12:21 +08:00
|
|
|
This pass performs store to load forwarding for memref's to eliminate memory
|
|
|
|
accesses and potentially the entire memref if all its accesses are forwarded.
|
2019-02-20 02:26:53 +08:00
|
|
|
|
|
|
|
Input
|
|
|
|
|
|
|
|
```mlir
|
|
|
|
func @store_load_affine_apply() -> memref<10x10xf32> {
|
|
|
|
%cf7 = constant 7.0 : f32
|
|
|
|
%m = alloc() : memref<10x10xf32>
|
2019-03-26 01:14:34 +08:00
|
|
|
affine.for %i0 = 0 to 10 {
|
|
|
|
affine.for %i1 = 0 to 10 {
|
2019-08-21 01:38:27 +08:00
|
|
|
affine.store %cf7, %m[%i0, %i1] : memref<10x10xf32>
|
|
|
|
%v0 = affine.load %m[%i0, %i1] : memref<10x10xf32>
|
2019-02-20 02:26:53 +08:00
|
|
|
%v1 = addf %v0, %v0 : f32
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return %m : memref<10x10xf32>
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
Output
|
|
|
|
|
|
|
|
```mlir
|
2019-08-21 01:38:27 +08:00
|
|
|
module {
|
|
|
|
func @store_load_affine_apply() -> memref<10x10xf32> {
|
|
|
|
%cst = constant 7.000000e+00 : f32
|
|
|
|
%0 = alloc() : memref<10x10xf32>
|
|
|
|
affine.for %arg0 = 0 to 10 {
|
|
|
|
affine.for %arg1 = 0 to 10 {
|
|
|
|
affine.store %cst, %0[%arg0, %arg1] : memref<10x10xf32>
|
|
|
|
%1 = addf %cst, %cst : f32
|
|
|
|
}
|
2019-02-20 02:26:53 +08:00
|
|
|
}
|
2019-08-21 01:38:27 +08:00
|
|
|
return %0 : memref<10x10xf32>
|
2019-02-20 02:26:53 +08:00
|
|
|
}
|
|
|
|
}
|
2019-08-21 01:38:27 +08:00
|
|
|
|
2019-02-20 02:26:53 +08:00
|
|
|
```
|
|
|
|
|
2019-04-05 23:19:42 +08:00
|
|
|
## Memref dependence analysis (`-memref-dependence-check`)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
2019-02-21 06:12:21 +08:00
|
|
|
This pass performs dependence analysis to determine dependences between pairs of
|
|
|
|
memory operations (load's and store's) on memref's. Dependence analysis exploits
|
2019-02-20 02:26:53 +08:00
|
|
|
polyhedral information available (affine maps, expressions, and affine.apply
|
|
|
|
operations) to precisely represent dependences using affine constraints, while
|
|
|
|
also computing dependence vectors from them, where each component of the
|
|
|
|
dependence vector provides a lower and an upper bound on the dependence distance
|
|
|
|
along the corresponding dimension.
|
|
|
|
|
|
|
|
```mlir
|
|
|
|
test/Transforms/memref-dataflow-opt.mlir:232:7: note: dependence from 2 to 1 at depth 1 = ([1, 1], [-inf, +inf])
|
|
|
|
store %cf9, %m[%idx] : memref<10xf32>
|
|
|
|
```
|
|
|
|
|
2019-05-04 02:07:37 +08:00
|
|
|
## Pipeline data transfer (`-affine-pipeline-data-transfer`)
|
2019-02-20 02:26:53 +08:00
|
|
|
|
2019-02-21 06:12:21 +08:00
|
|
|
This pass performs a transformation to overlap non-blocking DMA operations in a
|
|
|
|
loop with computations through double buffering. This is achieved by advancing
|
|
|
|
dma_start operations with respect to other operations.
|
|
|
|
|
|
|
|
Input
|
2019-02-20 02:26:53 +08:00
|
|
|
|
|
|
|
```mlir
|
2019-08-21 01:38:27 +08:00
|
|
|
func @pipelinedatatransfer() {
|
2019-02-20 02:26:53 +08:00
|
|
|
%0 = alloc() : memref<256xf32>
|
|
|
|
%1 = alloc() : memref<32xf32, 1>
|
|
|
|
%2 = alloc() : memref<1xf32>
|
|
|
|
%c0 = constant 0 : index
|
|
|
|
%c128 = constant 128 : index
|
2019-03-26 01:14:34 +08:00
|
|
|
affine.for %i0 = 0 to 8 {
|
2019-08-21 01:38:27 +08:00
|
|
|
affine.dma_start %0[%i0], %1[%i0], %2[%c0], %c128 : memref<256xf32>, memref<32xf32, 1>, memref<1xf32>
|
|
|
|
affine.dma_wait %2[%c0], %c128 : memref<1xf32>
|
|
|
|
%3 = affine.load %1[%i0] : memref<32xf32, 1>
|
2019-02-20 02:26:53 +08:00
|
|
|
%4 = "compute"(%3) : (f32) -> f32
|
2019-08-21 01:38:27 +08:00
|
|
|
affine.store %4, %1[%i0] : memref<32xf32, 1>
|
2019-02-20 02:26:53 +08:00
|
|
|
}
|
2019-08-21 01:38:27 +08:00
|
|
|
return
|
|
|
|
}
|
2019-02-20 02:26:53 +08:00
|
|
|
```
|
|
|
|
|
2019-02-21 06:12:21 +08:00
|
|
|
Output
|
|
|
|
|
2019-02-20 02:26:53 +08:00
|
|
|
```mlir
|
2019-08-21 01:38:27 +08:00
|
|
|
module {
|
|
|
|
func @pipelinedatatransfer() {
|
|
|
|
%c8 = constant 8 : index
|
|
|
|
%c0 = constant 0 : index
|
|
|
|
%0 = alloc() : memref<256xf32>
|
|
|
|
%c0_0 = constant 0 : index
|
|
|
|
%c128 = constant 128 : index
|
|
|
|
%1 = alloc() : memref<2x32xf32, 1>
|
|
|
|
%2 = alloc() : memref<2x1xf32>
|
|
|
|
affine.dma_start %0[%c0], %1[%c0 mod 2, %c0], %2[%c0 mod 2, symbol(%c0_0)], %c128 : memref<256xf32>, memref<2x32xf32, 1>, memref<2x1xf32>
|
|
|
|
affine.for %arg0 = 1 to 8 {
|
|
|
|
affine.dma_start %0[%arg0], %1[%arg0 mod 2, %arg0], %2[%arg0 mod 2, symbol(%c0_0)], %c128 : memref<256xf32>, memref<2x32xf32, 1>, memref<2x1xf32>
|
|
|
|
%8 = affine.apply #map3(%arg0)
|
|
|
|
%9 = affine.apply #map4(%8)
|
|
|
|
%10 = affine.apply #map4(%8)
|
|
|
|
affine.dma_wait %2[%8 mod 2, symbol(%c0_0)], %c128 : memref<2x1xf32>
|
|
|
|
%11 = affine.load %1[%8 mod 2, %8] : memref<2x32xf32, 1>
|
|
|
|
%12 = "compute"(%11) : (f32) -> f32
|
|
|
|
affine.store %12, %1[%8 mod 2, %8] : memref<2x32xf32, 1>
|
|
|
|
}
|
|
|
|
%3 = affine.apply #map3(%c8)
|
|
|
|
%4 = affine.apply #map4(%3)
|
|
|
|
%5 = affine.apply #map4(%3)
|
|
|
|
affine.dma_wait %2[%3 mod 2, symbol(%c0_0)], %c128 : memref<2x1xf32>
|
|
|
|
%6 = affine.load %1[%3 mod 2, %3] : memref<2x32xf32, 1>
|
|
|
|
%7 = "compute"(%6) : (f32) -> f32
|
|
|
|
affine.store %7, %1[%3 mod 2, %3] : memref<2x32xf32, 1>
|
|
|
|
dealloc %2 : memref<2x1xf32>
|
|
|
|
dealloc %1 : memref<2x32xf32, 1>
|
|
|
|
return
|
2019-02-20 02:26:53 +08:00
|
|
|
}
|
2019-08-21 01:38:27 +08:00
|
|
|
}
|
2019-02-20 02:26:53 +08:00
|
|
|
```
|