pr to master #8
12
README.md
12
README.md
|
@ -69,6 +69,8 @@ MindSpore offers build options across multiple backends:
|
|||
| | Ubuntu-aarch64 | ✔️ |
|
||||
| | EulerOS-x86 | ✔️ |
|
||||
| | EulerOS-aarch64 | ✔️ |
|
||||
| | CentOS-x86 | ✔️ |
|
||||
| | CentOS-aarch64 | ✔️ |
|
||||
| GPU CUDA 10.1 | Ubuntu-x86 | ✔️ |
|
||||
| CPU | Ubuntu-x86 | ✔️ |
|
||||
| | Ubuntu-aarch64 | ✔️ |
|
||||
|
@ -79,7 +81,7 @@ For installation using `pip`, take `CPU` and `Ubuntu-x86` build version as an ex
|
|||
1. Download whl from [MindSpore download page](https://www.mindspore.cn/versions/en), and install the package.
|
||||
|
||||
```
|
||||
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.7.0-beta/MindSpore/cpu/ubuntu_x86/mindspore-0.7.0-cp37-cp37m-linux_x86_64.whl
|
||||
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/MindSpore/cpu/ubuntu_x86/mindspore-1.0.0-cp37-cp37m-linux_x86_64.whl
|
||||
```
|
||||
|
||||
2. Run the following command to verify the install.
|
||||
|
@ -136,8 +138,8 @@ currently the containerized build options are supported as follows:
|
|||
|
||||
For `CPU` backend, you can directly pull and run the latest stable image using the below command:
|
||||
```
|
||||
docker pull mindspore/mindspore-cpu:0.7.0-beta
|
||||
docker run -it mindspore/mindspore-cpu:0.7.0-beta /bin/bash
|
||||
docker pull mindspore/mindspore-cpu:1.0.0
|
||||
docker run -it mindspore/mindspore-cpu:1.0.0 /bin/bash
|
||||
```
|
||||
|
||||
* GPU
|
||||
|
@ -170,8 +172,8 @@ currently the containerized build options are supported as follows:
|
|||
```
|
||||
Then you can pull and run the latest stable image using the below command:
|
||||
```
|
||||
docker pull mindspore/mindspore-gpu:0.7.0-beta
|
||||
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:0.7.0-beta /bin/bash
|
||||
docker pull mindspore/mindspore-gpu:1.0.0
|
||||
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:1.0.0 /bin/bash
|
||||
```
|
||||
|
||||
To test if the docker image works, please execute the python code below and check the output:
|
||||
|
|
12
README_CN.md
12
README_CN.md
|
@ -66,6 +66,8 @@ MindSpore提供跨多个后端的构建选项:
|
|||
| | Ubuntu-aarch64 | ✔️ |
|
||||
| | EulerOS-x86 | ✔️ |
|
||||
| | EulerOS-aarch64 | ✔️ |
|
||||
| | CentOS-x86 | ✔️ |
|
||||
| | CentOS-aarch64 | ✔️ |
|
||||
| GPU CUDA 10.1 | Ubuntu-x86 | ✔️ |
|
||||
| CPU | Ubuntu-x86 | ✔️ |
|
||||
| | Ubuntu-aarch64 | ✔️ |
|
||||
|
@ -76,7 +78,7 @@ MindSpore提供跨多个后端的构建选项:
|
|||
1. 请从[MindSpore下载页面](https://www.mindspore.cn/versions)下载并安装whl包。
|
||||
|
||||
```
|
||||
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.7.0-beta/MindSpore/cpu/ubuntu_x86/mindspore-0.7.0-cp37-cp37m-linux_x86_64.whl
|
||||
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/MindSpore/cpu/ubuntu_x86/mindspore-1.0.0-cp37-cp37m-linux_x86_64.whl
|
||||
```
|
||||
|
||||
2. 执行以下命令,验证安装结果。
|
||||
|
@ -132,8 +134,8 @@ MindSpore的Docker镜像托管在[Docker Hub](https://hub.docker.com/r/mindspore
|
|||
|
||||
对于`CPU`后端,可以直接使用以下命令获取并运行最新的稳定镜像:
|
||||
```
|
||||
docker pull mindspore/mindspore-cpu:0.7.0-beta
|
||||
docker run -it mindspore/mindspore-cpu:0.7.0-beta /bin/bash
|
||||
docker pull mindspore/mindspore-cpu:1.0.0
|
||||
docker run -it mindspore/mindspore-cpu:1.0.0 /bin/bash
|
||||
```
|
||||
|
||||
* GPU
|
||||
|
@ -166,8 +168,8 @@ MindSpore的Docker镜像托管在[Docker Hub](https://hub.docker.com/r/mindspore
|
|||
```
|
||||
使用以下命令获取并运行最新的稳定镜像:
|
||||
```
|
||||
docker pull mindspore/mindspore-gpu:0.7.0-beta
|
||||
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:0.7.0-beta /bin/bash
|
||||
docker pull mindspore/mindspore-gpu:1.0.0
|
||||
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:1.0.0 /bin/bash
|
||||
```
|
||||
|
||||
要测试Docker是否正常工作,请运行下面的Python代码并检查输出:
|
||||
|
|
109
RELEASE.md
109
RELEASE.md
|
@ -1,3 +1,112 @@
|
|||
# Release 1.0.0
|
||||
|
||||
## Major Features and Improvements
|
||||
### MindSpore Training and Inference Framework
|
||||
#### Ascend 910
|
||||
* New models
|
||||
* DenseNet121: a dense convolutional neural network, which connects each layer to every other layer in a feed-forward fashion for object recognition on ImageNet dataset.
|
||||
* UNet2D-Medical: Unet Medical model for 2D image segmentation, Convolutional Networks for Biomedical Image Segmentation on ISBI Challenge database.
|
||||
* Frontend and user interface
|
||||
* Second-Order Optimization
|
||||
* Enable second-order optimization for Bert on Ascend 910, which can achieve a masked lm accuracy of 71.3% in 800 seconds using 8 Ascend 910 (Bert-Large @MLPerf v0.7 dataset).
|
||||
* New GNN model BGCF
|
||||
* Bayesian Graph Convolutional Filtering network which naturally incorporate the uncertainty in the user-item interaction graph shows excellent recommendation performance on Amazon-Beauty dataset.
|
||||
* Add append interface for SequentialCell.
|
||||
* Add a level `auto` for AMP.
|
||||
* Executor and performance optimization
|
||||
* Support quantitative network (Resnet50 & YoloV3 & MobileNetV2).
|
||||
* Project ease of use optimization: project compilation time optimization, CMakelist regularization, cudnn, cuda independent compilation and installation independent.
|
||||
* Data processing, augmentation, and save format
|
||||
* Support GeneratorDataset return string type
|
||||
|
||||
#### Other Hardware Support
|
||||
* GPU platform
|
||||
* Enable second-order optimization for resnet50 on GPU, which achieve 30% improvement on training time compared to SGD with Momentum (Resnet50 @ImageNet).
|
||||
* CPU platform
|
||||
* ...
|
||||
#### User interfaces change log
|
||||
* Remove global object GradOperation in Autodiff([!5011](https://gitee.com/mindspore/mindspore/pulls/5011))
|
||||
* Remove useless attribute 'name' in Autodiff([!5172](https://gitee.com/mindspore/mindspore/pulls/5172))
|
||||
* Rectification distributed init([!5350](https://gitee.com/mindspore/mindspore/pulls/5350))
|
||||
* Move the setting of ParalleMode from train.parallel_utils to context([!5351](https://gitee.com/mindspore/mindspore/pulls/5351))
|
||||
* Modification of save_checkpoint([!5482](https://gitee.com/mindspore/mindspore/pulls/5482))
|
||||
* Wrap numpy random seed into an api([!5634](https://gitee.com/mindspore/mindspore/pulls/5634))
|
||||
* Delete enable_fused_layernorm in some modelzoo scripts([!5665](https://gitee.com/mindspore/mindspore/pulls/5665))
|
||||
* Move 'multi-subgraphs' interface to internal([!5696](https://gitee.com/mindspore/mindspore/pulls/5696))
|
||||
* Rename mirror_mean to gradient_mean([!5700](https://gitee.com/mindspore/mindspore/pulls/5700))
|
||||
* Remove default value of 'group' of DepthWiseConv2d([!5865](https://gitee.com/mindspore/mindspore/pulls/5865))
|
||||
* Modify interface for function and remove duplicated def([!5958](https://gitee.com/mindspore/mindspore/pulls/5958))
|
||||
* Unify Conv2d and DepthwiseConv2d([!5916](https://gitee.com/mindspore/mindspore/pulls/5916))
|
||||
* Modification of SoftmaxCrossEntropyWithLogits([!5502](https://gitee.com/mindspore/mindspore/pulls/5502))
|
||||
* Change API set_strategy() to shard()([!5991](https://gitee.com/mindspore/mindspore/pulls/5991))
|
||||
* Move batch_size from bert_cfg_cfg to cfg([!6233](https://gitee.com/mindspore/mindspore/pulls/6233))
|
||||
* Remove unused parameters from SummaryRecord __init__([!5548](https://gitee.com/mindspore/mindspore/pulls/5548))
|
||||
* remove sens parameter of TrainOneStepWithLossScaleCell([!5753](https://gitee.com/mindspore/mindspore/pulls/5753))
|
||||
* optimize the TrainOneStepCell for user's define([!6159](https://gitee.com/mindspore/mindspore/pulls/6159))
|
||||
* delete seed0 and seed1 of nn.Dropout([!5735](https://gitee.com/mindspore/mindspore/pulls/5735))
|
||||
* delete DataWrapper([!6101](https://gitee.com/mindspore/mindspore/pulls/6101))
|
||||
* LSTM API optimization([!6374](https://gitee.com/mindspore/mindspore/pulls/6374))
|
||||
* Merge P\C\F of ops([!5645](https://gitee.com/mindspore/mindspore/pulls/5645))
|
||||
* delete SoftmaxCrossEntropyExpand interface([!6607](https://gitee.com/mindspore/mindspore/pulls/6607))
|
||||
* Adjust GroupNorm interface([!6329](https://gitee.com/mindspore/mindspore/pulls/6329))
|
||||
* Modify init interface to internal interface([!6651](https://gitee.com/mindspore/mindspore/pulls/6651))
|
||||
* Log optimization([!5842](https://gitee.com/mindspore/mindspore/pulls/5842))
|
||||
* Remove useless API dataset.set_dataset_size([!5806](https://gitee.com/mindspore/mindspore/pulls/5806))
|
||||
* Some of Dataset API add usage parameter([!5605](https://gitee.com/mindspore/mindspore/pulls/5605))
|
||||
* Change the import path, such as from mindspore.dataset.transforms.vision to mindspore.dataset.vision.transforms([!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
|
||||
* Rename ImageFolderDatasetV2 to ImageFolderDataset([!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
|
||||
* Dataset.map parameter optimization([!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
|
||||
* Add new api dataset.get_col_names([!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
|
||||
* Add new api dataset.get_col_names([!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
|
||||
* Remove useless API MindRecord finish([!5580](https://gitee.com/mindspore/mindspore/pulls/5580))
|
||||
|
||||
### MindSpore Lite
|
||||
* Converter
|
||||
* Add 6 TFLite op, 7 Caffe op, 1 ONNX op.
|
||||
* Add support for Windows.
|
||||
* Support parallel inference of multiple sessions to adapt to more scenarios
|
||||
* Support 8bits only weight-quantization, most main-stream models has small accuracy loss (less than 0.5%) when compared to non-qunantized fp32 model.
|
||||
|
||||
* CPU & GPU
|
||||
* Add 20 CPU ops,include FP32, int8/uint8, FP16 and int32 ops.
|
||||
* Add supporting FP16 for GPU, add 14 GPU ops include FP32/FP16.
|
||||
* Add Buffer/Image2D transform op for GPU
|
||||
* Performance optimization for CPU ops focus on ARM32.
|
||||
* Performance optimization for GPU Convolution using winograd.
|
||||
|
||||
* Tool & example
|
||||
* Add object detection Android Demo.
|
||||
|
||||
## Bugfixes
|
||||
* Models
|
||||
* fix the constant folding problem in multiply.([!6092](https://gitee.com/mindspore/mindspore/pulls/6092))
|
||||
* move batch_size from bert_net_cfg to cfg in bert scripts.([!6233](https://gitee.com/mindspore/mindspore/pulls/6233))
|
||||
* modify the checkpoint file path.([!6137](https://gitee.com/mindspore/mindspore/pulls/6137))
|
||||
* Python API
|
||||
* fix semi auto parallel parameter of reshape has another user([!5722](https://gitee.com/mindspore/mindspore/pulls/5722))
|
||||
* raise ValueError when call hook function in graph mode([!5831](https://gitee.com/mindspore/mindspore/pulls/5831))
|
||||
* Executor
|
||||
* fix pynative mode to build temporary nn objects.([!6189](https://gitee.com/mindspore/mindspore/pulls/6189))
|
||||
* fix the accuracy problem of multiple inputs of multi-card communication operator broadcast.([!6522](https://gitee.com/mindspore/mindspore/pulls/5622))
|
||||
* fix the problem that the sample distribution interface categorical does not support graph mode.([!5772](https://gitee.com/mindspore/mindspore/pulls/5772))
|
||||
* fix the random seed failure problem of the polynomial downsampling distribution operator.([!5948](https://gitee.com/mindspore/mindspore/pulls/5948))
|
||||
* fix unnecessary address binding issues in GPU heterogeneous scenarios.([!6232](https://gitee.com/mindspore/mindspore/pulls/6232))
|
||||
* GPU platform
|
||||
* fix for kernel resource leak([!5315](https://gitee.com/mindspore/mindspore/pulls/5315))
|
||||
* fix for insufficient memory for continuous unit test running([!5617](https://gitee.com/mindspore/mindspore/pulls/5617))
|
||||
* fix for the memory leak in the sparse slicer([!5578](https://gitee.com/mindspore/mindspore/pulls/5578))
|
||||
* Data processing
|
||||
* fix hang when use pyfunc([!6346](https://gitee.com/mindspore/mindspore/pulls/6346))
|
||||
* fix GPU device queue does not release GIL during resource clean up([!5964](https://gitee.com/mindspore/mindspore/pulls/5964))
|
||||
* fix hang if scripte exit unnormally([!6441](https://gitee.com/mindspore/mindspore/pulls/6441))
|
||||
|
||||
## Contributors
|
||||
Thanks goes to these wonderful people:
|
||||
|
||||
Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, danish, Danish, dayschan, eric, Eric, fary86, fuzhiye, Gaoxiong, gengdongjie, gongdaguo, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huzhifeng, hwjiaorui, Jesse, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, r1chardf1d0, riemann_penn, root, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhoufeng, zhousiyi, zhouyaqiang, Zichun, Zirui, Ziyan, zjun, ZPaC
|
||||
|
||||
Contributions of any kind are welcome!
|
||||
|
||||
# Release 0.7.0-beta
|
||||
|
||||
## Major Features and Improvements
|
||||
|
|
2
akg
2
akg
|
@ -1 +1 @@
|
|||
Subproject commit 4d897c23fc41c5f7013efd0c517796233671518a
|
||||
Subproject commit 6c492a8c9d9730ad11ffc5481cc532ae500b0da5
|
22
build.bat
22
build.bat
|
@ -16,12 +16,13 @@
|
|||
@title mindspore_build
|
||||
|
||||
SET BASEPATH=%CD%
|
||||
IF NOT EXIST "%BASEPATH%/build" (
|
||||
SET BUILD_PATH=%BASEPATH%/build
|
||||
|
||||
IF NOT EXIST "%BUILD_PATH%" (
|
||||
md "build"
|
||||
)
|
||||
|
||||
cd %BASEPATH%/build
|
||||
set BUILD_PATH=%CD%
|
||||
cd %BUILD_PATH%
|
||||
|
||||
IF NOT EXIST "%BUILD_PATH%/mindspore" (
|
||||
md "mindspore"
|
||||
|
@ -38,7 +39,7 @@ IF "%1%" == "lite" (
|
|||
call :run_cmake
|
||||
IF errorlevel 1 (
|
||||
echo "cmake fail."
|
||||
call :run_fail
|
||||
goto run_fail
|
||||
)
|
||||
) ELSE (
|
||||
call :gene_protobuf
|
||||
|
@ -53,7 +54,7 @@ IF "%1%" == "lite" (
|
|||
)
|
||||
IF errorlevel 1 (
|
||||
echo "build fail."
|
||||
call :run_fail
|
||||
goto run_fail
|
||||
) ELSE (
|
||||
cd %BASEPATH%/output
|
||||
rd /s /q _CPack_Packages
|
||||
|
@ -63,7 +64,7 @@ IF "%1%" == "lite" (
|
|||
-G "CodeBlocks - MinGW Makefiles" ../..
|
||||
IF NOT %errorlevel% == 0 (
|
||||
echo "cmake fail."
|
||||
call :run_fail
|
||||
goto run_fail
|
||||
)
|
||||
|
||||
IF "%1%" == "" (
|
||||
|
@ -73,7 +74,7 @@ IF "%1%" == "lite" (
|
|||
)
|
||||
IF NOT %errorlevel% == 0 (
|
||||
echo "build fail."
|
||||
call :run_fail
|
||||
goto run_fail
|
||||
)
|
||||
)
|
||||
|
||||
|
@ -82,10 +83,14 @@ cd %BASEPATH%
|
|||
goto run_eof
|
||||
|
||||
:run_cmake
|
||||
set VERSION_MAJOR=1
|
||||
set VERSION_MINOR=0
|
||||
set VERSION_REVISION=0
|
||||
echo "======Start building MindSpore Lite %VERSION_MAJOR%.%VERSION_MINOR%.%VERSION_REVISION%======"
|
||||
cd %BUILD_PATH%/mindspore
|
||||
cmake -DBUILD_DEVICE=on -DBUILD_CONVERTER=on -DPLATFORM_ARM64=off -DSUPPORT_TRAIN=off ^
|
||||
-DCMAKE_BUILD_TYPE=Release -DSUPPORT_GPU=off -DBUILD_MINDDATA=off -DOFFLINE_COMPILE=off ^
|
||||
-DMS_VERSION_MAJOR=0 -DMS_VERSION_MINOR=7 -DMS_VERSION_REVISION=0 ^
|
||||
-DMS_VERSION_MAJOR=%VERSION_MAJOR% -DMS_VERSION_MINOR=%VERSION_MINOR% -DMS_VERSION_REVISION=%VERSION_REVISION% ^
|
||||
-G "CodeBlocks - MinGW Makefiles" "%BASEPATH%/mindspore/lite"
|
||||
GOTO:EOF
|
||||
|
||||
|
@ -123,6 +128,5 @@ GOTO:EOF
|
|||
:run_fail
|
||||
cd %BASEPATH%
|
||||
set errorlevel=1
|
||||
EXIT
|
||||
|
||||
:run_eof
|
||||
|
|
5
build.sh
5
build.sh
|
@ -25,7 +25,7 @@ usage()
|
|||
echo "bash build.sh [-d] [-r] [-v] [-c on|off] [-t on|off] [-g on|off] [-h] [-b ge] [-m infer|train] \\"
|
||||
echo " [-a on|off] [-p on|off] [-i] [-L] [-R] [-D on|off] [-j[n]] [-e gpu|d|cpu] \\"
|
||||
echo " [-P on|off] [-z [on|off]] [-M on|off] [-V 9.2|10.1] [-I arm64|arm32|x86_64] [-K] \\"
|
||||
echo " [-B on|off] [-w on|off] [-E] [-l on|off] [-n full|lite|off] [-T on|off]"
|
||||
echo " [-B on|off] [-w on|off] [-E] [-l on|off] [-n full|lite|off]"
|
||||
echo ""
|
||||
echo "Options:"
|
||||
echo " -d Debug mode"
|
||||
|
@ -58,7 +58,6 @@ usage()
|
|||
echo " -B Enable debugger, default on"
|
||||
echo " -E Enable IBVERBS for parameter server, default off"
|
||||
echo " -l Compile with python dependency, default on"
|
||||
echo " -T Enable on-device training, default off"
|
||||
}
|
||||
|
||||
# check value of input is 'on' or 'off'
|
||||
|
@ -237,7 +236,7 @@ checkopts()
|
|||
;;
|
||||
z)
|
||||
eval ARG=\$\{$OPTIND\}
|
||||
if [[ -n $ARG && $ARG != -* ]]; then
|
||||
if [[ -n "$ARG" && "$ARG" != -* ]]; then
|
||||
OPTARG="$ARG"
|
||||
check_on_off $OPTARG z
|
||||
OPTIND=$((OPTIND + 1))
|
||||
|
|
|
@ -10,10 +10,10 @@ if (CMAKE_SYSTEM_NAME MATCHES "Windows")
|
|||
MD5 17757c84f49edd42d34ae8c9288110a1)
|
||||
else()
|
||||
mindspore_add_pkg(onednn
|
||||
VER 1.5
|
||||
VER 1.7
|
||||
LIBS dnnl mkldnn
|
||||
URL https://github.com/oneapi-src/oneDNN/archive/v1.5.tar.gz
|
||||
MD5 5d97e0e8f4c0b37da5f524533b7a644b
|
||||
URL https://github.com/oneapi-src/oneDNN/archive/v1.7-rc.tar.gz
|
||||
MD5 f5e3ce761f6521e235817299f882b965
|
||||
CMAKE_OPTION -DDNNL_ARCH_OPT_FLAGS='' -DDNNL_CPU_RUNTIME='SEQ' -DDNNL_BUILD_EXAMPLES=OFF -DDNNL_BUILD_TESTS=OFF)
|
||||
endif()
|
||||
|
||||
|
|
|
@ -56,7 +56,7 @@ if (PLATFORM_ARM64)
|
|||
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.so DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
|
||||
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.a DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
|
||||
install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${INC_DIR}/ir/dtype COMPONENT ${COMPONENT_NAME})
|
||||
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR} COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
|
||||
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR} COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "train_model.h" EXCLUDE PATTERN "train_session.h" EXCLUDE)
|
||||
install(DIRECTORY ${TOP_DIR}/mindspore/lite/schema/ DESTINATION ${INC_DIR}/schema COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "inner" EXCLUDE)
|
||||
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite-optimize.so DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
|
||||
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite-fp16.so DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
|
||||
|
@ -65,7 +65,7 @@ elseif (PLATFORM_ARM32)
|
|||
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.so DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
|
||||
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.a DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
|
||||
install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${INC_DIR}/ir/dtype COMPONENT ${COMPONENT_NAME})
|
||||
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR} COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
|
||||
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR} COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "train_model.h" EXCLUDE PATTERN "train_session.h" EXCLUDE)
|
||||
install(DIRECTORY ${TOP_DIR}/mindspore/lite/schema/ DESTINATION ${INC_DIR}/schema COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "inner" EXCLUDE)
|
||||
install(DIRECTORY ${TOP_DIR}/third_party/flatbuffers/include DESTINATION ${FLATBF_DIR} COMPONENT ${COMPONENT_NAME})
|
||||
elseif (CMAKE_SYSTEM_NAME MATCHES "Windows")
|
||||
|
@ -75,7 +75,7 @@ elseif (CMAKE_SYSTEM_NAME MATCHES "Windows")
|
|||
install(FILES ${LIB_LIST} DESTINATION ${TOP_DIR}/build/mindspore/package COMPONENT ${COMPONENT_NAME})
|
||||
install(FILES ${TOP_DIR}/build/mindspore/tools/converter/libconverter_parser.a DESTINATION ${TOP_DIR}/build/mindspore/package COMPONENT ${PARSER_NAME})
|
||||
else ()
|
||||
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR_RUN_X86} COMPONENT ${RUN_X86_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
|
||||
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR_RUN_X86} COMPONENT ${RUN_X86_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "train_model.h" EXCLUDE PATTERN "train_session.h" EXCLUDE)
|
||||
install(DIRECTORY ${TOP_DIR}/mindspore/lite/schema/ DESTINATION ${INC_DIR_RUN_X86}/schema COMPONENT ${RUN_X86_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "inner" EXCLUDE)
|
||||
install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${INC_DIR_RUN_X86}/ir/dtype COMPONENT ${RUN_X86_COMPONENT_NAME})
|
||||
install(DIRECTORY ${TOP_DIR}/third_party/flatbuffers/include DESTINATION ${FLATBF_DIR_RUN_X86} COMPONENT ${RUN_X86_COMPONENT_NAME})
|
||||
|
|
|
@ -0,0 +1,67 @@
|
|||
FROM ubuntu:18.04
|
||||
|
||||
MAINTAINER leonwanghui <leon.wanghui@huawei.com>
|
||||
|
||||
# Set env
|
||||
ENV PYTHON_ROOT_PATH /usr/local/python-3.7.5
|
||||
ENV PATH /usr/local/bin:$PATH
|
||||
|
||||
# Install base tools
|
||||
RUN apt update \
|
||||
&& DEBIAN_FRONTEND=noninteractive apt install -y \
|
||||
vim \
|
||||
wget \
|
||||
curl \
|
||||
xz-utils \
|
||||
net-tools \
|
||||
openssh-client \
|
||||
git \
|
||||
ntpdate \
|
||||
tzdata \
|
||||
tcl \
|
||||
sudo \
|
||||
bash-completion
|
||||
|
||||
# Install compile tools
|
||||
RUN DEBIAN_FRONTEND=noninteractive apt install -y \
|
||||
gcc \
|
||||
g++ \
|
||||
zlibc \
|
||||
make \
|
||||
libgmp-dev \
|
||||
patch \
|
||||
autoconf \
|
||||
libtool \
|
||||
automake \
|
||||
flex
|
||||
|
||||
# Set bash
|
||||
RUN echo "dash dash/sh boolean false" | debconf-set-selections
|
||||
RUN DEBIAN_FRONTEND=noninteractive dpkg-reconfigure dash
|
||||
|
||||
# Install python (v3.7.5)
|
||||
RUN apt install -y libffi-dev libssl-dev zlib1g-dev libbz2-dev libncurses5-dev \
|
||||
libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libsqlite3-dev \
|
||||
&& cd /tmp \
|
||||
&& wget https://github.com/python/cpython/archive/v3.7.5.tar.gz \
|
||||
&& tar -xvf v3.7.5.tar.gz \
|
||||
&& cd /tmp/cpython-3.7.5 \
|
||||
&& mkdir -p ${PYTHON_ROOT_PATH} \
|
||||
&& ./configure --prefix=${PYTHON_ROOT_PATH} \
|
||||
&& make -j4 \
|
||||
&& make install -j4 \
|
||||
&& rm -f /usr/local/bin/python \
|
||||
&& rm -f /usr/local/bin/pip \
|
||||
&& ln -s ${PYTHON_ROOT_PATH}/bin/python3.7 /usr/local/bin/python \
|
||||
&& ln -s ${PYTHON_ROOT_PATH}/bin/pip3.7 /usr/local/bin/pip \
|
||||
&& rm -rf /tmp/cpython-3.7.5 \
|
||||
&& rm -f /tmp/v3.7.5.tar.gz
|
||||
|
||||
# Set pip source
|
||||
RUN mkdir -pv /root/.pip \
|
||||
&& echo "[global]" > /root/.pip/pip.conf \
|
||||
&& echo "trusted-host=mirrors.aliyun.com" >> /root/.pip/pip.conf \
|
||||
&& echo "index-url=http://mirrors.aliyun.com/pypi/simple/" >> /root/.pip/pip.conf
|
||||
|
||||
# Install MindSpore cpu whl package
|
||||
RUN pip install --no-cache-dir https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/MindSpore/cpu/ubuntu_x86/mindspore-1.0.0-cp37-cp37m-linux_x86_64.whl
|
|
@ -0,0 +1,81 @@
|
|||
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
|
||||
|
||||
MAINTAINER leonwanghui <leon.wanghui@huawei.com>
|
||||
|
||||
# Set env
|
||||
ENV PYTHON_ROOT_PATH /usr/local/python-3.7.5
|
||||
ENV OMPI_ROOT_PATH /usr/local/openmpi-3.1.5
|
||||
ENV PATH ${OMPI_ROOT_PATH}/bin:/usr/local/bin:$PATH
|
||||
ENV LD_LIBRARY_PATH ${OMPI_ROOT_PATH}/lib:$LD_LIBRARY_PATH
|
||||
|
||||
# Install base tools
|
||||
RUN apt update \
|
||||
&& DEBIAN_FRONTEND=noninteractive apt install -y \
|
||||
vim \
|
||||
wget \
|
||||
curl \
|
||||
xz-utils \
|
||||
net-tools \
|
||||
openssh-client \
|
||||
git \
|
||||
ntpdate \
|
||||
tzdata \
|
||||
tcl \
|
||||
sudo \
|
||||
bash-completion
|
||||
|
||||
# Install compile tools
|
||||
RUN DEBIAN_FRONTEND=noninteractive apt install -y \
|
||||
gcc \
|
||||
g++ \
|
||||
zlibc \
|
||||
make \
|
||||
libgmp-dev \
|
||||
patch \
|
||||
autoconf \
|
||||
libtool \
|
||||
automake \
|
||||
flex
|
||||
|
||||
# Set bash
|
||||
RUN echo "dash dash/sh boolean false" | debconf-set-selections
|
||||
RUN DEBIAN_FRONTEND=noninteractive dpkg-reconfigure dash
|
||||
|
||||
# Install python (v3.7.5)
|
||||
RUN apt install -y libffi-dev libssl-dev zlib1g-dev libbz2-dev libncurses5-dev \
|
||||
libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libsqlite3-dev \
|
||||
&& cd /tmp \
|
||||
&& wget https://github.com/python/cpython/archive/v3.7.5.tar.gz \
|
||||
&& tar -xvf v3.7.5.tar.gz \
|
||||
&& cd /tmp/cpython-3.7.5 \
|
||||
&& mkdir -p ${PYTHON_ROOT_PATH} \
|
||||
&& ./configure --prefix=${PYTHON_ROOT_PATH} \
|
||||
&& make -j4 \
|
||||
&& make install -j4 \
|
||||
&& rm -f /usr/local/bin/python \
|
||||
&& rm -f /usr/local/bin/pip \
|
||||
&& ln -s ${PYTHON_ROOT_PATH}/bin/python3.7 /usr/local/bin/python \
|
||||
&& ln -s ${PYTHON_ROOT_PATH}/bin/pip3.7 /usr/local/bin/pip \
|
||||
&& rm -rf /tmp/cpython-3.7.5 \
|
||||
&& rm -f /tmp/v3.7.5.tar.gz
|
||||
|
||||
# Set pip source
|
||||
RUN mkdir -pv /root/.pip \
|
||||
&& echo "[global]" > /root/.pip/pip.conf \
|
||||
&& echo "trusted-host=mirrors.aliyun.com" >> /root/.pip/pip.conf \
|
||||
&& echo "index-url=http://mirrors.aliyun.com/pypi/simple/" >> /root/.pip/pip.conf
|
||||
|
||||
# Install openmpi (v3.1.5)
|
||||
RUN cd /tmp \
|
||||
&& wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.5.tar.gz \
|
||||
&& tar -xvf openmpi-3.1.5.tar.gz \
|
||||
&& cd /tmp/openmpi-3.1.5 \
|
||||
&& mkdir -p ${OMPI_ROOT_PATH} \
|
||||
&& ./configure --prefix=${OMPI_ROOT_PATH} \
|
||||
&& make -j4 \
|
||||
&& make install -j4 \
|
||||
&& rm -rf /tmp/openmpi-3.1.5 \
|
||||
&& rm -f /tmp/openmpi-3.1.5.tar.gz
|
||||
|
||||
# Install MindSpore cuda-10.1 whl package
|
||||
RUN pip install --no-cache-dir https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/MindSpore/gpu/ubuntu_x86/cuda-10.1/mindspore_gpu-1.0.0-cp37-cp37m-linux_x86_64.whl
|
|
@ -1 +1 @@
|
|||
Subproject commit 4f6ebe0f92da8c758b1b71a2bc7ff33f9344e18a
|
||||
Subproject commit 14db109491bc81473905a5eb9e82f6234aca419b
|
|
@ -74,7 +74,11 @@ class InferTensorBase {
|
|||
if (data_len == 0) {
|
||||
return true;
|
||||
}
|
||||
memcpy_s(mutable_data(), data_size(), data, data_len);
|
||||
auto ret = memcpy_s(mutable_data(), data_size(), data, data_len);
|
||||
if (ret != 0) {
|
||||
MSI_LOG_ERROR << "Set data memcpy_s failed";
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
|
|
|
@ -81,7 +81,6 @@ class MS_API InferSession {
|
|||
}
|
||||
static std::shared_ptr<InferSession> CreateSession(const std::string &device, uint32_t device_id);
|
||||
};
|
||||
|
||||
} // namespace inference
|
||||
} // namespace mindspore
|
||||
#endif // MINDSPORE_INCLUDE_MS_SESSION_H
|
||||
|
|
|
@ -13,7 +13,6 @@
|
|||
# limitations under the License.
|
||||
# ============================================================================
|
||||
"""tbe common"""
|
||||
import json
|
||||
import os
|
||||
|
||||
class TBEException(Exception):
|
||||
|
@ -27,23 +26,6 @@ class TBEException(Exception):
|
|||
return self.__error_msg
|
||||
|
||||
|
||||
def get_ddk_version():
|
||||
"""get ddk version"""
|
||||
ddk_version = os.environ.get("DDK_VERSION")
|
||||
if ddk_version is None:
|
||||
default_ddk_info_file = '/usr/local/HiAI/runtime/ddk_info'
|
||||
backup_ddk_info_file = '/usr/local/Ascend/fwkacllib/ddk_info'
|
||||
if os.path.exists(default_ddk_info_file):
|
||||
with open(default_ddk_info_file, "r") as fp:
|
||||
ddk_version = json.load(fp)["VERSION"]
|
||||
elif os.path.exists(backup_ddk_info_file):
|
||||
with open(backup_ddk_info_file, "r") as fp:
|
||||
ddk_version = json.load(fp)["VERSION"]
|
||||
else:
|
||||
ddk_version = "Ascend910"
|
||||
return ddk_version
|
||||
|
||||
|
||||
def get_build_in_impl_path():
|
||||
"""get build-in tbe implement path"""
|
||||
tbe_impl_path = os.environ.get("TBE_IMPL_PATH")
|
||||
|
|
|
@ -18,9 +18,8 @@ import os
|
|||
import sys
|
||||
from te.platform.cce_conf import te_set_version
|
||||
from te.platform.fusion_util import fusion_op
|
||||
from common import check_kernel_info, get_args, get_build_in_impl_path, get_ddk_version
|
||||
from common import check_kernel_info, get_args, get_build_in_impl_path
|
||||
|
||||
ddk_version = get_ddk_version()
|
||||
build_in_impl_path = get_build_in_impl_path()
|
||||
|
||||
# op function list
|
||||
|
@ -30,7 +29,6 @@ fusion_pattern_end_flag = "fusion_pattern_end"
|
|||
|
||||
def _initialize(impl_path):
|
||||
"""Initialize"""
|
||||
te_set_version(ddk_version)
|
||||
if impl_path == "":
|
||||
op_module_name = build_in_impl_path
|
||||
else:
|
||||
|
@ -53,7 +51,7 @@ def build_op(build_type, json_str):
|
|||
"""
|
||||
kernel_info = json.loads(json_str)
|
||||
check_kernel_info(kernel_info)
|
||||
|
||||
te_set_version(kernel_info["op_info"]["socVersion"])
|
||||
op_name = kernel_info['op_info']['name']
|
||||
|
||||
try:
|
||||
|
@ -111,7 +109,7 @@ def compile_fusion_op(json_str):
|
|||
Exception: If specific keyword is not found.
|
||||
"""
|
||||
args = json.loads(json_str)
|
||||
te_set_version(ddk_version)
|
||||
te_set_version(args['fusion_op']["socVersion"])
|
||||
if 'fusion_op' not in args or not args['fusion_op']:
|
||||
raise ValueError("Json string Errors, key:fusion_op not found.")
|
||||
fusion_op_arg = args['fusion_op']
|
||||
|
|
|
@ -104,7 +104,7 @@ class TbeProcess:
|
|||
def __init__(self):
|
||||
self.__processe_num = multiprocessing.cpu_count()
|
||||
# max_processes_num: Set the maximum number of concurrent processes for compiler
|
||||
max_processes_num = 16
|
||||
max_processes_num = 24
|
||||
if self.__processe_num > max_processes_num:
|
||||
self.__processe_num = max_processes_num
|
||||
self.__pool = None
|
||||
|
|
|
@ -122,15 +122,16 @@ class AscendEnvChecker(EnvChecker):
|
|||
|
||||
def __init__(self):
|
||||
self.version = ["1.75.22.0.220"]
|
||||
atlas_fwk_version = "/usr/local/Ascend/nnae/latest/fwkacllib/version.info"
|
||||
atlas_fwk_version = "/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/version.info"
|
||||
hisi_fwk_version = "/usr/local/Ascend/fwkacllib/version.info"
|
||||
if os.path.exists(atlas_fwk_version):
|
||||
# atlas default path
|
||||
self.fwk_path = "/usr/local/Ascend/nnae/latest/fwkacllib"
|
||||
self.op_impl_path = "/usr/local/Ascend/nnae/latest/opp/op_impl/built-in/ai_core/tbe"
|
||||
self.fwk_path = "/usr/local/Ascend/ascend-toolkit/latest/fwkacllib"
|
||||
self.op_impl_path = "/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe"
|
||||
self.tbe_path = self.fwk_path + "/lib64"
|
||||
self.cce_path = self.fwk_path + "/ccec_compiler/bin"
|
||||
self.fwk_version = atlas_fwk_version
|
||||
self.op_path = "/usr/local/Ascend/ascend-toolkit/latest/opp"
|
||||
elif os.path.exists(hisi_fwk_version):
|
||||
# hisi default path
|
||||
self.fwk_path = "/usr/local/Ascend/fwkacllib"
|
||||
|
@ -138,6 +139,7 @@ class AscendEnvChecker(EnvChecker):
|
|||
self.tbe_path = self.fwk_path + "/lib64"
|
||||
self.cce_path = self.fwk_path + "/ccec_compiler/bin"
|
||||
self.fwk_version = hisi_fwk_version
|
||||
self.op_path = ""
|
||||
else:
|
||||
# custom or unknown environment
|
||||
self.fwk_path = ""
|
||||
|
@ -145,17 +147,20 @@ class AscendEnvChecker(EnvChecker):
|
|||
self.tbe_path = ""
|
||||
self.cce_path = ""
|
||||
self.fwk_version = ""
|
||||
self.op_path = ""
|
||||
|
||||
# env
|
||||
self.path = os.getenv("PATH")
|
||||
self.python_path = os.getenv("PYTHONPATH")
|
||||
self.ld_lib_path = os.getenv("LD_LIBRARY_PATH")
|
||||
self.ascend_opp_path = os.getenv("ASCEND_OPP_PATH")
|
||||
|
||||
# check content
|
||||
self.path_check = "/fwkacllib/ccec_compiler/bin/"
|
||||
self.python_path_check = "opp/op_impl/built_in/ai_core/tbe/"
|
||||
self.ld_lib_path_check_fwk = "/fwkacllib/lib64/"
|
||||
self.ld_lib_path_check_addons = "/add-ons/"
|
||||
self.ascend_opp_path_check = "/op"
|
||||
self.v = ""
|
||||
|
||||
def check_env(self, e):
|
||||
|
@ -205,6 +210,15 @@ class AscendEnvChecker(EnvChecker):
|
|||
f"No such directory: {self.cce_path}, Please check if Ascend 910 AI software package is "
|
||||
"installed correctly.")
|
||||
|
||||
if self.op_path is None:
|
||||
pass
|
||||
elif Path(self.op_path).is_dir():
|
||||
os.environ['ASCEND_OPP_PATH'] = self.op_path
|
||||
else:
|
||||
raise EnvironmentError(
|
||||
f"No such directory: {self.op_path}, Please check if Ascend 910 AI software package is "
|
||||
"installed correctly.")
|
||||
|
||||
def _check_env(self):
|
||||
"""ascend dependence path check"""
|
||||
if self.path is None or self.path_check not in self.path:
|
||||
|
@ -223,6 +237,11 @@ class AscendEnvChecker(EnvChecker):
|
|||
"LD_LIBRARY_PATH, you can reference to the installation guidelines "
|
||||
"https://www.mindspore.cn/install")
|
||||
|
||||
if self.ascend_opp_path is None or self.ascend_opp_path_check not in self.ascend_opp_path:
|
||||
logger.warning(
|
||||
"Can not find opp path (need by mindspore-ascend), please check if you have set env ASCEND_OPP_PATH, "
|
||||
"you can reference to the installation guidelines https://www.mindspore.cn/install")
|
||||
|
||||
def _read_version(self, file_path):
|
||||
"""get ascend version info"""
|
||||
with open(file_path, 'r') as f:
|
||||
|
|
|
@ -258,7 +258,7 @@ else ()
|
|||
endif ()
|
||||
set(MINDSPORE_RPATH ${ORIGIN_PATH}/lib)
|
||||
if (ENABLE_D)
|
||||
set(MINDSPORE_RPATH ${MINDSPORE_RPATH}:/usr/local/Ascend/nnae/latest/fwkacllib/lib64)
|
||||
set(MINDSPORE_RPATH ${MINDSPORE_RPATH}:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64)
|
||||
set(MINDSPORE_RPATH ${MINDSPORE_RPATH}:/usr/local/Ascend/fwkacllib/lib64)
|
||||
set(MINDSPORE_RPATH ${MINDSPORE_RPATH}:/usr/local/Ascend/add-ons)
|
||||
elseif (ENABLE_GPU)
|
||||
|
|
|
@ -0,0 +1,139 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
#include <string>
|
||||
#include "backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_grad_cpu_kernel.h"
|
||||
#include "utils/ms_utils.h"
|
||||
#include "backend/kernel_compiler/cpu/mkldnn/mkl_kernel_engine.h"
|
||||
#include "runtime/device/cpu/cpu_device_address.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace kernel {
|
||||
void FusedBatchNormGradCPUKernel::InitInputOutputSize(const CNodePtr &kernel_node) {
|
||||
|
||||
CPUKernel::InitInputOutputSize(kernel_node);
|
||||
MS_EXCEPTION_IF_NULL(kernel_node);
|
||||
size_t type_size = sizeof(float);
|
||||
std::vector<size_t> shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
|
||||
|
||||
// for store diff scale and bias
|
||||
size_t tensor_size = shape[1] * 2 * type_size; // [2, c] to store scale and bias
|
||||
workspace_size_list_.emplace_back(tensor_size);
|
||||
}
|
||||
|
||||
void FusedBatchNormGradCPUKernel::InitKernel(const CNodePtr &kernel_node) {
|
||||
MS_EXCEPTION_IF_NULL(kernel_node);
|
||||
MS_LOG(INFO) << "start init FusedBatchNormGradCPUKernel " ;
|
||||
auto node_name = AnfAlgo::GetCNodeName(kernel_node);
|
||||
|
||||
/**
|
||||
* create desc for dy and x
|
||||
*/
|
||||
//get dy and x
|
||||
std::vector<size_t> dy_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
|
||||
std::vector<size_t> x_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 1);
|
||||
// the dy and x should be nchw ,so we need check them
|
||||
if (dy_shape.size() != 4 || x_shape.size() != 4) {
|
||||
MS_LOG(EXCEPTION) << "Fused batchnorm grad only support nchw input!";
|
||||
}
|
||||
|
||||
// des for x,y,dy,dx
|
||||
batch_size = x_shape[0];
|
||||
channel = x_shape[1];
|
||||
hw_size = x_shape[2] * x_shape[3];
|
||||
nhw_size = x_shape[0] * hw_size;
|
||||
dnnl::memory::desc x_desc = GetDefaultMemDesc(x_shape);
|
||||
|
||||
// create scale_bias_desc
|
||||
dnnl::memory::desc scale_desc = GetDefaultMemDesc({1, channel});
|
||||
|
||||
auto epsilon = AnfAlgo::GetNodeAttr<float>(kernel_node, "epsilon");
|
||||
|
||||
/**
|
||||
* define forward hint
|
||||
* */
|
||||
auto f_prop_kind = dnnl::prop_kind::forward_training; //forward prop kind
|
||||
auto f_normalization_flags = dnnl::normalization_flags::use_scale_shift; //forward normalization flags
|
||||
dnnl::batch_normalization_forward::desc fhint_desc =
|
||||
dnnl::batch_normalization_forward::desc(f_prop_kind, x_desc, epsilon, f_normalization_flags);
|
||||
auto fhint_prim_desc = dnnl::batch_normalization_forward::primitive_desc(fhint_desc, MKLKernelEngine::Get().engine());
|
||||
|
||||
/**
|
||||
* define backward
|
||||
*/
|
||||
|
||||
auto bprop_kind = dnnl::prop_kind::backward;
|
||||
auto bnormalization_flags = dnnl::normalization_flags::use_scale_shift;
|
||||
//create desc for backward
|
||||
dnnl::batch_normalization_backward::desc b_desc =
|
||||
dnnl::batch_normalization_backward::desc(bprop_kind, fhint_prim_desc.dst_desc(), x_desc, epsilon, bnormalization_flags);
|
||||
//create prim_desc for backward
|
||||
auto bprim_desc = dnnl::batch_normalization_backward::primitive_desc(b_desc, MKLKernelEngine::Get().engine(), fhint_prim_desc);
|
||||
|
||||
primitive_ = std::make_shared<dnnl::batch_normalization_backward>(bprim_desc);
|
||||
|
||||
/**
|
||||
* add Argument for backward
|
||||
*/
|
||||
AddArgument(DNNL_ARG_SRC,bprim_desc.src_desc());//x
|
||||
AddArgument(DNNL_ARG_DIFF_DST,bprim_desc.diff_dst_desc());//dy
|
||||
AddArgument(DNNL_ARG_SCALE_SHIFT,scale_desc); // only need scale
|
||||
AddArgument(DNNL_ARG_MEAN, bprim_desc.mean_desc()); // mean
|
||||
AddArgument(DNNL_ARG_VARIANCE, bprim_desc.variance_desc()); // var
|
||||
|
||||
AddArgument(DNNL_ARG_DIFF_SRC,bprim_desc.diff_src_desc()); //dx
|
||||
AddArgument(DNNL_ARG_DIFF_SCALE_SHIFT, bprim_desc.diff_weights_desc()); //diff scale and diff bias
|
||||
AddArgument(DNNL_ARG_WORKSPACE, bprim_desc.workspace_desc());// workspace
|
||||
|
||||
}
|
||||
|
||||
bool FusedBatchNormGradCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
|
||||
const std::vector<kernel::AddressPtr> &workspace,
|
||||
const std::vector<kernel::AddressPtr> &outputs) {
|
||||
//input : dy x scale save_mean save_inv_variance 5
|
||||
MS_LOG(INFO) <<" start FusedBatchNormGradCPUkernel";
|
||||
|
||||
if (inputs.size() < 5 || outputs.empty()) {
|
||||
MS_LOG(EXCEPTION) << "Error input output size!";
|
||||
}
|
||||
|
||||
SetArgumentHandle(DNNL_ARG_SRC,inputs[1]->addr);//x
|
||||
SetArgumentHandle(DNNL_ARG_DIFF_DST,inputs[0]->addr);//dy
|
||||
SetArgumentHandle(DNNL_ARG_SCALE_SHIFT,inputs[2]->addr); // only need scale
|
||||
SetArgumentHandle(DNNL_ARG_MEAN, inputs[3]->addr); // mean
|
||||
SetArgumentHandle(DNNL_ARG_VARIANCE, inputs[4]->addr); // var
|
||||
|
||||
SetArgumentHandle(DNNL_ARG_DIFF_SRC,outputs[0]->addr); //dx
|
||||
SetArgumentHandle(DNNL_ARG_DIFF_SCALE_SHIFT, workspace[0]->addr); //diff scale and diff bias
|
||||
ExecutePrimitive();
|
||||
|
||||
auto scale = reinterpret_cast<float *>(outputs[1]->addr);
|
||||
auto bias = reinterpret_cast<float *>(outputs[2]->addr);
|
||||
auto wksp = reinterpret_cast<float *>(workspace[0]->addr);
|
||||
|
||||
auto scale_ret = memcpy_s(scale, outputs[1]->size, workspace[0]->addr, outputs[1]->size);
|
||||
auto bias_ret = memcpy_s(bias , outputs[2]->size, wksp + (outputs[1]->size / sizeof(float)), outputs[2]->size);
|
||||
|
||||
if (scale_ret != 0 || bias_ret !=0) {
|
||||
MS_LOG(EXCEPTION) << "Memcpy_s error.";
|
||||
return false;
|
||||
}
|
||||
|
||||
MS_LOG(INFO) << " done batchnorm grad";
|
||||
return true;
|
||||
}
|
||||
} // namespace kernel
|
||||
} // namespace mindspore
|
|
@ -0,0 +1,63 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_GRAD_CPU_KERNEL_H_
|
||||
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_GRAD_CPU_KERNEL_H_
|
||||
|
||||
#include <vector>
|
||||
#include <memory>
|
||||
#include "backend/kernel_compiler/cpu/mkldnn/mkl_cpu_kernel.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace kernel {
|
||||
class FusedBatchNormGradCPUKernel : public MKLCPUKernel {
|
||||
public:
|
||||
FusedBatchNormGradCPUKernel() = default;
|
||||
~FusedBatchNormGradCPUKernel() override = default;
|
||||
|
||||
void InitKernel(const CNodePtr &kernel_node) override;
|
||||
|
||||
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
|
||||
const std::vector<AddressPtr> &outputs) override;
|
||||
|
||||
protected:
|
||||
void InitInputOutputSize(const CNodePtr &kernel_node) override;
|
||||
|
||||
private:
|
||||
size_t batch_size{0};
|
||||
size_t channel{0};
|
||||
size_t hw_size{0};
|
||||
size_t nhw_size{0};
|
||||
|
||||
};
|
||||
|
||||
// registe fusedBatchnormGrad
|
||||
MS_REG_CPU_KERNEL(FusedBatchNormGrad,
|
||||
KernelAttr()
|
||||
.AddInputAttr(kNumberTypeFloat32)
|
||||
.AddInputAttr(kNumberTypeFloat32)
|
||||
.AddInputAttr(kNumberTypeFloat32)
|
||||
.AddInputAttr(kNumberTypeFloat32)
|
||||
.AddInputAttr(kNumberTypeFloat32)
|
||||
.AddOutputAttr(kNumberTypeFloat32)
|
||||
.AddOutputAttr(kNumberTypeFloat32)
|
||||
.AddOutputAttr(kNumberTypeFloat32),
|
||||
FusedBatchNormGradCPUKernel)
|
||||
|
||||
|
||||
} // namespace kernel
|
||||
} // namespace mindspore
|
||||
|
||||
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_GRAD_CPU_KERNEL_H_
|
|
@ -0,0 +1,67 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#include "backend/kernel_compiler/cpu/mkldnn/realdiv_cpu_kernel.h"
|
||||
#include "backend/kernel_compiler/cpu/mkldnn/mkl_kernel_engine.h"
|
||||
#include "runtime/device/cpu/cpu_device_address.h"
|
||||
#include "utils/ms_utils.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace kernel {
|
||||
void RealDivCPUKernel::InitKernel(const CNodePtr &kernel_node) {
|
||||
MS_EXCEPTION_IF_NULL(kernel_node);
|
||||
MS_LOG(INFO) << " Init Realdiv " ;
|
||||
std::vector<size_t> src0_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
|
||||
std::vector<size_t> src1_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 1);
|
||||
std::vector<size_t> dst_shape = AnfAlgo::GetOutputDeviceShape(kernel_node, 0);
|
||||
if (src0_shape.size() != src1_shape.size() && src1_shape.size() > 1) {
|
||||
MS_LOG(EXCEPTION) << "RealDivCPUKernel : RealDiv only support same dim input or tensor / scalar " << src0_shape.size() << " vs "
|
||||
<< src1_shape.size();
|
||||
}
|
||||
|
||||
|
||||
if (src1_shape.size() < src0_shape.size()) {
|
||||
for (size_t i = src1_shape.size(); i < src0_shape.size(); ++i) {
|
||||
src1_shape.emplace_back(1);
|
||||
}
|
||||
}
|
||||
dnnl::memory::desc src0_desc = GetDefaultMemDesc(src0_shape);
|
||||
dnnl::memory::desc src1_desc = GetDefaultMemDesc(src1_shape);
|
||||
dnnl::memory::desc dst_desc = GetDefaultMemDesc(dst_shape);
|
||||
|
||||
dnnl::binary::desc desc = dnnl::binary::desc(dnnl::algorithm::binary_div, src0_desc, src1_desc, dst_desc);
|
||||
auto prim_desc = dnnl::binary::primitive_desc(desc, MKLKernelEngine::Get().engine());
|
||||
primitive_ = std::make_shared<dnnl::binary>(prim_desc);
|
||||
AddArgument(DNNL_ARG_SRC_0, src0_desc);
|
||||
AddArgument(DNNL_ARG_SRC_1, src1_desc);
|
||||
AddArgument(DNNL_ARG_DST, dst_desc);
|
||||
}
|
||||
|
||||
bool RealDivCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
|
||||
const std::vector<kernel::AddressPtr> & /*workspace*/,
|
||||
const std::vector<kernel::AddressPtr> &outputs) {
|
||||
|
||||
MS_LOG(INFO) << "RealDivCPUKernel : start Launch Realdiv" ;
|
||||
if (inputs.size() < 2 || outputs.empty()) {
|
||||
MS_LOG(EXCEPTION) << "RealDivCPUKernel : Realdiv error input output size!";
|
||||
}
|
||||
SetArgumentHandle(DNNL_ARG_SRC_0, inputs[0]->addr);
|
||||
SetArgumentHandle(DNNL_ARG_SRC_1, inputs[1]->addr);
|
||||
SetArgumentHandle(DNNL_ARG_DST, outputs[0]->addr);
|
||||
ExecutePrimitive();
|
||||
return true;
|
||||
}
|
||||
} // namespace kernel
|
||||
} // namespace mindspore
|
|
@ -0,0 +1,42 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_REALDIV_CPU_KERNEL_H_
|
||||
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_REALDIV_CPU_KERNEL_H_
|
||||
#include <vector>
|
||||
#include <memory>
|
||||
#include "backend/kernel_compiler/cpu/mkldnn/mkl_cpu_kernel.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace kernel {
|
||||
class RealDivCPUKernel : public MKLCPUKernel {
|
||||
public:
|
||||
RealDivCPUKernel() = default;
|
||||
~RealDivCPUKernel() override = default;
|
||||
|
||||
void InitKernel(const CNodePtr &kernel_node) override;
|
||||
|
||||
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
|
||||
const std::vector<AddressPtr> &outputs) override;
|
||||
};
|
||||
|
||||
MS_REG_CPU_KERNEL(
|
||||
RealDiv,
|
||||
KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
|
||||
RealDivCPUKernel);
|
||||
} // namespace kernel
|
||||
} // namespace mindspore
|
||||
|
||||
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_REALDIV_CPU_KERNEL_H_
|
|
@ -39,6 +39,7 @@ void ReduceCPUKernel::InitKernel(const CNodePtr &kernel_node) {
|
|||
}
|
||||
shape_ = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
|
||||
auto axis_addr = AnfAlgo::GetCNodePrimitive(kernel_node)->GetAttr(AXIS);
|
||||
|
||||
if (axis_addr->isa<ValueTuple>()) {
|
||||
auto attr_axis = AnfAlgo::GetNodeAttr<std::vector<int>>(kernel_node, AXIS);
|
||||
if (attr_axis.size() > shape_.size()) {
|
||||
|
@ -47,18 +48,24 @@ void ReduceCPUKernel::InitKernel(const CNodePtr &kernel_node) {
|
|||
axis_.push_back(shape_.size() - 1);
|
||||
} else {
|
||||
for (auto axis : attr_axis) {
|
||||
while (axis < 0) {
|
||||
axis += SizeToInt(shape_.size());
|
||||
}
|
||||
if (IntToSize(axis) >= (shape_.size())) {
|
||||
MS_LOG(EXCEPTION) << "axis value is oversize.";
|
||||
}
|
||||
axis < 0 ? axis_.push_back(axis + shape_.size()) : axis_.push_back(axis);
|
||||
axis_.push_back(IntToSize(axis));
|
||||
}
|
||||
}
|
||||
} else if (axis_addr->isa<Int32Imm>()) {
|
||||
int axis = AnfAlgo::GetNodeAttr<int>(kernel_node, AXIS);
|
||||
if (axis >= 0 && IntToSize(axis) >= shape_.size()) {
|
||||
while (axis < 0) {
|
||||
axis += SizeToInt(shape_.size());
|
||||
}
|
||||
if (IntToSize(axis) >= shape_.size()) {
|
||||
MS_LOG(EXCEPTION) << "axis value is oversize.";
|
||||
}
|
||||
axis < 0 ? axis_.push_back(axis + shape_.size()) : axis_.push_back(axis);
|
||||
axis_.push_back(IntToSize(axis));
|
||||
} else {
|
||||
MS_LOG(EXCEPTION) << "Attribute axis type is invalid.";
|
||||
}
|
||||
|
|
|
@ -0,0 +1,173 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#include "backend/kernel_compiler/cpu/tile_cpu_kernel.h"
|
||||
#include "runtime/device/cpu/cpu_device_address.h"
|
||||
|
||||
#include <iostream>
|
||||
#include <string>
|
||||
|
||||
|
||||
namespace mindspore {
|
||||
namespace kernel {
|
||||
|
||||
|
||||
size_t TileCPUKernel::GetOuterSize(std::vector<size_t> shape, int index){
|
||||
size_t size = 1;
|
||||
for(int i=0;i<index;i++){
|
||||
size *= shape.at(i);
|
||||
}
|
||||
return size;
|
||||
}
|
||||
|
||||
size_t TileCPUKernel::GetInnerSize(std::vector<size_t> shape, int index){
|
||||
size_t size = 1;
|
||||
int shape_size = shape.size();
|
||||
for(int i=index;i < shape_size;i++){
|
||||
size *= shape.at(i);
|
||||
}
|
||||
return size;
|
||||
}
|
||||
|
||||
size_t TileCPUKernel::GetTypeSize(){
|
||||
|
||||
size_t type_size =1;
|
||||
if (InputTensorType == kNumberTypeInt32){
|
||||
type_size = sizeof(int);
|
||||
}
|
||||
else if(InputTensorType == kNumberTypeInt64){
|
||||
type_size = sizeof(int64_t);
|
||||
}
|
||||
else if(InputTensorType == kNumberTypeInt16){
|
||||
type_size = sizeof(int16_t);
|
||||
}
|
||||
else if(InputTensorType == kNumberTypeFloat32){
|
||||
type_size = sizeof(float);
|
||||
}
|
||||
return type_size;
|
||||
}
|
||||
|
||||
|
||||
void TileCPUKernel::InitKernel(const CNodePtr &kernel_node) {
|
||||
|
||||
MS_LOG(INFO) << " start Init Tile kernel";
|
||||
MS_EXCEPTION_IF_NULL(kernel_node);
|
||||
src0_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
|
||||
multiples = AnfAlgo::GetNodeAttr<std::vector<int>>(kernel_node, "multiples");
|
||||
dst_shape = AnfAlgo::GetOutputDeviceShape(kernel_node, 0);
|
||||
|
||||
if (src0_shape.size() > multiples.size()) {
|
||||
MS_LOG(EXCEPTION) << "The rank of input is greater than multiples.";
|
||||
}
|
||||
|
||||
// make src0_shape align to multiples
|
||||
if (src0_shape.size() < multiples.size()) {
|
||||
for (size_t i = src0_shape.size(); i < multiples.size(); ++i) {
|
||||
src0_shape.emplace(src0_shape.begin(),1);
|
||||
}
|
||||
}
|
||||
if(src0_shape.size() != multiples.size()){
|
||||
MS_LOG(EXCEPTION) << "make src0_shape align to multiples failed.";
|
||||
}
|
||||
|
||||
size_t output_tensor_size = 1;
|
||||
InputTensorType = AnfAlgo::GetInputDeviceDataType(kernel_node, 0);
|
||||
for(auto each: dst_shape){
|
||||
output_tensor_size *= each ;
|
||||
}
|
||||
output_tensor_size *= GetTypeSize();
|
||||
workspace_size_list_.emplace_back(output_tensor_size);
|
||||
|
||||
}
|
||||
|
||||
bool TileCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
|
||||
const std::vector<kernel::AddressPtr> & workspace,
|
||||
const std::vector<kernel::AddressPtr> &outputs) {
|
||||
|
||||
MS_LOG(INFO) << " start Launch Tile kernel";
|
||||
if (inputs.empty() || outputs.empty()) {
|
||||
MS_LOG(EXCEPTION) << "input or output empty!";
|
||||
}
|
||||
if (inputs[0]->size > outputs[0]->size) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if (InputTensorType == kNumberTypeInt32){
|
||||
LaunchKernel<int>(inputs, workspace ,outputs);
|
||||
}
|
||||
else if(InputTensorType == kNumberTypeInt64){
|
||||
LaunchKernel<int64_t>(inputs,workspace , outputs);
|
||||
}
|
||||
else if(InputTensorType == kNumberTypeInt16){
|
||||
LaunchKernel<int16_t>(inputs,workspace , outputs);
|
||||
}
|
||||
else if(InputTensorType == kNumberTypeFloat32){
|
||||
LaunchKernel<float>(inputs, workspace ,outputs);
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Launch kernel base on input type
|
||||
*/
|
||||
template <typename T>
|
||||
void TileCPUKernel::LaunchKernel(const std::vector<kernel::AddressPtr> &inputs,
|
||||
const std::vector<kernel::AddressPtr> & workspace,
|
||||
const std::vector<kernel::AddressPtr> &outputs) {
|
||||
|
||||
size_t shape_size = src0_shape.size();
|
||||
size_t item_size = sizeof(T);
|
||||
|
||||
auto X_ptr = reinterpret_cast<T *>(inputs[0]->addr);
|
||||
auto Y_ptr = reinterpret_cast<T *>(outputs[0]->addr);
|
||||
auto wksp_temp = reinterpret_cast<T *>(workspace[0]->addr); //workspace for save temp result
|
||||
|
||||
auto start_of_wksp = reinterpret_cast<T *>(workspace[0]->addr);
|
||||
auto start_of_Yptr = reinterpret_cast<T *>(outputs[0]->addr);
|
||||
|
||||
memcpy_s(Y_ptr , inputs[0]->size, X_ptr, inputs[0]->size);
|
||||
|
||||
int data_count=1;
|
||||
auto temp_dst_shape = src0_shape; //restore temp result shape
|
||||
|
||||
for (int axis_t = shape_size -1 ; axis_t >= 0 ; axis_t-- ){
|
||||
|
||||
auto temp_dst_size = GetInnerSize(temp_dst_shape,0) * item_size; //get temp dst size
|
||||
memcpy_s(wksp_temp , temp_dst_size, Y_ptr, temp_dst_size );
|
||||
|
||||
int outer_size = GetOuterSize( temp_dst_shape, axis_t);
|
||||
int inner_size = GetInnerSize(temp_dst_shape, axis_t);
|
||||
int tile = multiples[axis_t];
|
||||
|
||||
int item_count = inner_size;
|
||||
data_count = item_count * item_size ;
|
||||
|
||||
|
||||
for (int i=0; i < outer_size ; i++){
|
||||
for ( int t =0 ; t < tile ; t++){
|
||||
memcpy_s(Y_ptr , data_count, wksp_temp, data_count); // copy to Y
|
||||
Y_ptr += item_count;
|
||||
}
|
||||
wksp_temp += item_count;
|
||||
}
|
||||
// reset to start
|
||||
Y_ptr = start_of_Yptr ;
|
||||
wksp_temp = start_of_wksp;
|
||||
temp_dst_shape[axis_t] = dst_shape[axis_t] ; //set temp shape
|
||||
}
|
||||
}
|
||||
} // namespace kernel
|
||||
} // namespace mindspore
|
|
@ -0,0 +1,64 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_TILE_CPU_KERNEL_H_
|
||||
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_TILE_CPU_KERNEL_H_
|
||||
#include <vector>
|
||||
#include <memory>
|
||||
#include "backend/kernel_compiler/cpu/cpu_kernel.h"
|
||||
#include "backend/kernel_compiler/cpu/cpu_kernel_factory.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace kernel {
|
||||
class TileCPUKernel : public CPUKernel {
|
||||
public:
|
||||
TileCPUKernel() = default;
|
||||
~TileCPUKernel() override = default;
|
||||
|
||||
void InitKernel(const CNodePtr &kernel_node) override;
|
||||
|
||||
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
|
||||
const std::vector<AddressPtr> &outputs) override;
|
||||
private:
|
||||
// launch kernel base on input type
|
||||
template <typename T>
|
||||
void LaunchKernel(const std::vector<kernel::AddressPtr> &inputs,
|
||||
const std::vector<kernel::AddressPtr> & workspace,
|
||||
const std::vector<kernel::AddressPtr> &outputs);
|
||||
|
||||
size_t GetOuterSize(std::vector<size_t>, int );
|
||||
size_t GetInnerSize(std::vector<size_t>, int);
|
||||
|
||||
TypeId InputTensorType{kNumberTypeInt32};
|
||||
size_t GetTypeSize();
|
||||
|
||||
std::vector<size_t> src0_shape;
|
||||
std::vector<int> multiples;
|
||||
std::vector<size_t> dst_shape;
|
||||
};
|
||||
|
||||
MS_REG_CPU_KERNEL(Tile, KernelAttr().AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
|
||||
TileCPUKernel);
|
||||
MS_REG_CPU_KERNEL(Tile, KernelAttr().AddInputAttr(kNumberTypeInt32).AddOutputAttr(kNumberTypeInt32),
|
||||
TileCPUKernel);
|
||||
MS_REG_CPU_KERNEL(Tile, KernelAttr().AddInputAttr(kNumberTypeInt64).AddOutputAttr(kNumberTypeInt64),
|
||||
TileCPUKernel);
|
||||
MS_REG_CPU_KERNEL(Tile, KernelAttr().AddInputAttr(kNumberTypeInt16).AddOutputAttr(kNumberTypeInt16),
|
||||
TileCPUKernel);
|
||||
|
||||
} // namespace kernel
|
||||
} // namespace mindspore
|
||||
|
||||
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_TILE_CPU_KERNEL_H_
|
|
@ -0,0 +1,446 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
#include <limits>
|
||||
#include "ctcloss_impl.cuh"
|
||||
#include "runtime/device/gpu/cuda_common.h"
|
||||
template <typename T>
|
||||
__device__ T LogSumExp(const T logprob1, const T logprob2) {
|
||||
if (logprob1 == logprob2 && logprob1 == -std::numeric_limits<T>::infinity()) {
|
||||
return logprob1;
|
||||
} else {
|
||||
return (logprob1 > logprob2) ? logprob1 + log1pf(expf(logprob2 - logprob1))
|
||||
: logprob2 + log1pf(expf(logprob1 - logprob2));
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
__global__ void CalculateFwdVarKernel(T *log_alpha_b, int *label_value_with_blank, T *softmax_probs,
|
||||
const int *sequence_length, bool ctc_merge_repeated, int batch, int SOffSet,
|
||||
int maxtime, int blank, int *label_squence_length, int *cum_labels_length,
|
||||
bool ignore_longer_outputs_than_inputs) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
|
||||
if (sequence_length[i] == 0 ||
|
||||
(ignore_longer_outputs_than_inputs && label_squence_length[i] > sequence_length[i])) {
|
||||
} else {
|
||||
T *log_alpha_b_cur = &log_alpha_b[i * SOffSet * maxtime];
|
||||
int *label_value_with_blank_cur = &label_value_with_blank[0];
|
||||
if (i > 0) {
|
||||
label_value_with_blank_cur = &label_value_with_blank[2 * cum_labels_length[i - 1] + i];
|
||||
}
|
||||
int numclass = blank + 1;
|
||||
int U = 2 * label_squence_length[i] + 1;
|
||||
int Ti = sequence_length[i];
|
||||
int low = 0;
|
||||
int high = 0;
|
||||
log_alpha_b_cur[0] = log(softmax_probs[i * numclass + blank]);
|
||||
int label0 = blank;
|
||||
if (U > 1) {
|
||||
label0 = label_value_with_blank_cur[1];
|
||||
log_alpha_b_cur[maxtime] = log(softmax_probs[i * numclass + label0]);
|
||||
}
|
||||
for (int t = 1; t < Ti; ++t) {
|
||||
low = 0;
|
||||
high = U;
|
||||
int low_limit = U - (2 * (Ti - t));
|
||||
int high_limit = 2 * (t + 1);
|
||||
if (low_limit > low) {
|
||||
low = low_limit;
|
||||
}
|
||||
if (high_limit < U) {
|
||||
high = high_limit;
|
||||
}
|
||||
for (int u = low; u < high; ++u) {
|
||||
T sum_log_alpha = -std::numeric_limits<T>::infinity();
|
||||
if (ctc_merge_repeated || label_value_with_blank_cur[u] == blank) {
|
||||
sum_log_alpha = log_alpha_b_cur[u * maxtime + t - 1];
|
||||
}
|
||||
if (u > 0) {
|
||||
sum_log_alpha = LogSumExp(sum_log_alpha, log_alpha_b_cur[(u - 1) * maxtime + t - 1]);
|
||||
}
|
||||
if (u > 1) {
|
||||
const bool matching_labels_merge =
|
||||
ctc_merge_repeated && (label_value_with_blank_cur[u] == label_value_with_blank_cur[u - 2]);
|
||||
if (label_value_with_blank_cur[u] != blank && !matching_labels_merge) {
|
||||
sum_log_alpha = LogSumExp(sum_log_alpha, log_alpha_b_cur[(u - 2) * maxtime + t - 1]);
|
||||
}
|
||||
}
|
||||
log_alpha_b_cur[u * maxtime + t] =
|
||||
log(softmax_probs[i * numclass + label_value_with_blank_cur[u] + t * numclass * batch]) + sum_log_alpha;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
__global__ void CalculateBwdVarKernel(T *log_beta_b, int *label_value_with_blank, T *softmax_probs,
|
||||
const int *sequence_length, bool ctc_merge_repeated, int batch, int SOffSet,
|
||||
int maxtime, int blank, int *label_squence_length, int *cum_labels_length,
|
||||
bool ignore_longer_outputs_than_inputs) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
|
||||
if (sequence_length[i] == 0 ||
|
||||
(ignore_longer_outputs_than_inputs && label_squence_length[i] > sequence_length[i])) {
|
||||
} else {
|
||||
T *log_beta_b_cur = &log_beta_b[i * SOffSet * maxtime];
|
||||
int *label_value_with_blank_cur = &label_value_with_blank[0];
|
||||
if (i > 0) {
|
||||
label_value_with_blank_cur = &label_value_with_blank[2 * cum_labels_length[i - 1] + i];
|
||||
}
|
||||
int numclass = blank + 1;
|
||||
int U = 2 * label_squence_length[i] + 1;
|
||||
int Ti = sequence_length[i];
|
||||
int low = 0;
|
||||
int high = 0;
|
||||
if (U > 1) {
|
||||
for (int u = U - 2; u < U; ++u) {
|
||||
log_beta_b_cur[u * maxtime + Ti - 1] = 0;
|
||||
}
|
||||
} else {
|
||||
log_beta_b_cur[Ti - 1] = 0;
|
||||
log_beta_b_cur[Ti - 2] = 0;
|
||||
}
|
||||
for (int t = Ti - 2; t >= 0; --t) {
|
||||
low = 0;
|
||||
high = U;
|
||||
int low_limit = U - (2 * (Ti - t));
|
||||
int high_limit = 2 * (t + 1);
|
||||
if (low_limit > low) {
|
||||
low = low_limit;
|
||||
}
|
||||
if (high_limit < U) {
|
||||
high = high_limit;
|
||||
}
|
||||
for (int u = low; u < high; ++u) {
|
||||
if (ctc_merge_repeated || label_value_with_blank_cur[u] == blank) {
|
||||
log_beta_b_cur[u * maxtime + t] = LogSumExp(
|
||||
log_beta_b_cur[u * maxtime + t],
|
||||
log_beta_b_cur[u * maxtime + t + 1] +
|
||||
log(softmax_probs[i * numclass + label_value_with_blank_cur[u] + (t + 1) * numclass * batch]));
|
||||
}
|
||||
if (u + 1 < U) {
|
||||
log_beta_b_cur[u * maxtime + t] = LogSumExp(
|
||||
log_beta_b_cur[u * maxtime + t],
|
||||
log_beta_b_cur[(u + 1) * maxtime + t + 1] +
|
||||
log(softmax_probs[i * numclass + label_value_with_blank_cur[u + 1] + (t + 1) * numclass * batch]));
|
||||
}
|
||||
if (u + 2 < U) {
|
||||
const bool matching_labels_merge =
|
||||
ctc_merge_repeated && (label_value_with_blank_cur[u] == label_value_with_blank_cur[u + 2]);
|
||||
if (label_value_with_blank_cur[u] != blank && !matching_labels_merge) {
|
||||
log_beta_b_cur[u * maxtime + t] = LogSumExp(
|
||||
log_beta_b_cur[u * maxtime + t],
|
||||
log_beta_b_cur[(u + 2) * maxtime + t + 1] +
|
||||
log(softmax_probs[i * numclass + label_value_with_blank_cur[u + 2] + (t + 1) * numclass * batch]));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
__global__ void ProbInitKernel(T *prob_num, int size) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size; i += blockDim.x * gridDim.x) {
|
||||
prob_num[i] = -std::numeric_limits<T>::infinity();
|
||||
}
|
||||
}
|
||||
template <typename T>
|
||||
__global__ void LogBInitKernel(T *log_b, int log_prob_size) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < log_prob_size; i += blockDim.x * gridDim.x) {
|
||||
log_b[i] = -std::numeric_limits<T>::infinity();
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
__global__ void CTCLossKernel(T *log_alpha_b, T *log_beta_b, T *softmax_probs, int *label_value_with_blank, int batch,
|
||||
int SOffSet, int maxtime, int numclass, const int *sequence_length,
|
||||
int *label_squence_length, int *cum_labels_length, T *cost, T *grads, T *prob_num,
|
||||
bool ignore_longer_outputs_than_inputs) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
|
||||
if (sequence_length[i] == 0 ||
|
||||
(ignore_longer_outputs_than_inputs && label_squence_length[i] > sequence_length[i])) {
|
||||
} else {
|
||||
T *grad_cur = &grads[i * numclass];
|
||||
const T *softmax_probs_cur = &softmax_probs[i * numclass];
|
||||
T *prob_num_cur = &prob_num[i * numclass];
|
||||
int U = 2 * label_squence_length[i] + 1;
|
||||
T log_pzx = -std::numeric_limits<T>::infinity();
|
||||
const T *log_alpha_b_cur = &log_alpha_b[i * SOffSet * maxtime];
|
||||
const T *log_beta_b_cur = &log_beta_b[i * SOffSet * maxtime];
|
||||
int *label_value_with_blank_cur = &label_value_with_blank[0];
|
||||
if (i > 0) {
|
||||
label_value_with_blank_cur = &label_value_with_blank[2 * cum_labels_length[i - 1] + i];
|
||||
}
|
||||
for (int u = 0; u < U; ++u) {
|
||||
log_pzx = LogSumExp(log_pzx, log_alpha_b_cur[u * maxtime] + log_beta_b_cur[u * maxtime]);
|
||||
}
|
||||
cost[i] = -log_pzx;
|
||||
// grad
|
||||
int L = numclass;
|
||||
int Ti = sequence_length[i];
|
||||
if (log_pzx == -std::numeric_limits<T>::infinity()) {
|
||||
for (int t = 0; t < Ti; ++t) {
|
||||
for (int l = 0; l < L; ++l) {
|
||||
grad_cur[t * numclass * batch + l] = softmax_probs_cur[t * numclass * batch + l];
|
||||
}
|
||||
}
|
||||
} else {
|
||||
for (int t = 0; t < Ti; ++t) {
|
||||
for (int u = 0; u < U; ++u) {
|
||||
int l = label_value_with_blank_cur[u];
|
||||
prob_num_cur[t * batch * numclass + l] =
|
||||
LogSumExp(prob_num_cur[t * batch * numclass + l],
|
||||
log_alpha_b_cur[u * maxtime + t] + log_beta_b_cur[u * maxtime + t]);
|
||||
}
|
||||
for (int l = 0; l < L; ++l) {
|
||||
grad_cur[t * numclass * batch + l] =
|
||||
softmax_probs_cur[t * numclass * batch + l] - expf(prob_num_cur[t * batch * numclass + l] - log_pzx);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
__global__ void InnerSoftMaxKernel(const T *probs, T *softmax_probs, const int *sequence_length, int max_time,
|
||||
int batch, int numclass) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch * max_time; i += blockDim.x * gridDim.x) {
|
||||
int k = i / batch;
|
||||
int m = i % batch;
|
||||
if (k < sequence_length[m]) {
|
||||
T maxCoeff = 0.;
|
||||
T sumCoeff = 0.;
|
||||
for (int j = i * numclass; j < (i + 1) * numclass; ++j) {
|
||||
if (probs[j] > maxCoeff) {
|
||||
maxCoeff = probs[j];
|
||||
}
|
||||
}
|
||||
for (int j = i * numclass; j < (i + 1) * numclass; ++j) {
|
||||
sumCoeff += exp(probs[j] - maxCoeff);
|
||||
softmax_probs[j] = exp(probs[j] - maxCoeff);
|
||||
}
|
||||
for (int j = i * numclass; j < (i + 1) * numclass; ++j) {
|
||||
softmax_probs[j] /= sumCoeff;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
__global__ void GenLabelValuePCRKernel(int *label_value_sp, int *label_value_pcr, int *label_squence_length,
|
||||
int *cum_labels_length, int batch) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
|
||||
int L = label_squence_length[i];
|
||||
label_squence_length[i] = 0;
|
||||
int offset = 0;
|
||||
if (i > 0) {
|
||||
offset = cum_labels_length[i - 1];
|
||||
}
|
||||
for (int l = offset; l < L; ++l) {
|
||||
if (l == offset || label_value_sp[l] != label_value_sp[l - 1]) {
|
||||
label_value_pcr[offset + label_squence_length[i]++] = label_value_sp[l];
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
__global__ void UpdateLengthKernel(int *label_squence_length, int *cum_labels_length, int *max_labels_length,
|
||||
int batch) {
|
||||
max_labels_length[0] = 0;
|
||||
for (int i = 0; i < batch; ++i) {
|
||||
if (label_squence_length[i] > max_labels_length[0]) {
|
||||
max_labels_length[0] = label_squence_length[i];
|
||||
}
|
||||
if (i == 0) {
|
||||
cum_labels_length[i] = label_squence_length[i];
|
||||
} else {
|
||||
cum_labels_length[i] = label_squence_length[i] + cum_labels_length[i - 1];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void CalculateBwdVar(T *log_beta_b, int *label_value_with_blank, T *softmax_probs, const int *sequence_length,
|
||||
bool ctc_merge_repeated, int batch, int SOffSet, int maxtime, int blank, int *label_squence_length,
|
||||
int *cum_labels_length, bool ignore_longer_outputs_than_inputs, cudaStream_t stream) {
|
||||
int log_prob_size = SOffSet * batch * maxtime;
|
||||
LogBInitKernel<<<GET_BLOCKS(log_prob_size), GET_THREADS, 0, stream>>>(log_beta_b, log_prob_size);
|
||||
CalculateBwdVarKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(
|
||||
log_beta_b, label_value_with_blank, softmax_probs, sequence_length, ctc_merge_repeated, batch, SOffSet, maxtime,
|
||||
blank, label_squence_length, cum_labels_length, ignore_longer_outputs_than_inputs);
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void CalculateFwdVar(T *log_alpha_b, int *label_value_with_blank, T *softmax_probs, const int *sequence_length,
|
||||
bool ctc_merge_repeated, int batch, int SOffSet, int maxtime, int blank, int *label_squence_length,
|
||||
int *cum_labels_length, bool ignore_longer_outputs_than_inputs, cudaStream_t stream) {
|
||||
int log_prob_size = SOffSet * batch * maxtime;
|
||||
LogBInitKernel<<<GET_BLOCKS(log_prob_size), GET_THREADS, 0, stream>>>(log_alpha_b, log_prob_size);
|
||||
CalculateFwdVarKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(
|
||||
log_alpha_b, label_value_with_blank, softmax_probs, sequence_length, ctc_merge_repeated, batch, SOffSet, maxtime,
|
||||
blank, label_squence_length, cum_labels_length, ignore_longer_outputs_than_inputs);
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void InnerSoftMax(const T *probs, T *softmax_probs, const int *sequence_length, int max_time, int batch, int numclass,
|
||||
cudaStream_t stream) {
|
||||
InnerSoftMaxKernel<<<GET_BLOCKS(batch * max_time), GET_THREADS, 0, stream>>>(probs, softmax_probs, sequence_length,
|
||||
max_time, batch, numclass);
|
||||
}
|
||||
|
||||
__global__ void GenLabelWithBlankKernel(int *label_value, int *label_value_with_blank, int *label_squence_length,
|
||||
int *precum_labels_length, int *cum_labels_length, int batch, int blank) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
|
||||
int offset = 0;
|
||||
int offset1 = 0;
|
||||
if (i > 0) {
|
||||
offset = 2 * cum_labels_length[i - 1] + i;
|
||||
offset1 = precum_labels_length[i - 1];
|
||||
}
|
||||
for (int j = 0; j < label_squence_length[i]; ++j) {
|
||||
label_value_with_blank[offset + 2 * j] = blank;
|
||||
label_value_with_blank[offset + 2 * j + 1] = label_value[offset1 + j];
|
||||
}
|
||||
label_value_with_blank[offset + 2 * label_squence_length[i]] = blank;
|
||||
}
|
||||
}
|
||||
|
||||
void GenLabelWithBlank(int *label_value, int *label_value_with_blank, int *label_squence_length,
|
||||
int *precum_labels_length, int *cum_labels_length, int batch, int blank, cudaStream_t stream) {
|
||||
GenLabelWithBlankKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(
|
||||
label_value, label_value_with_blank, label_squence_length, precum_labels_length, cum_labels_length, batch, blank);
|
||||
}
|
||||
|
||||
void GenLabelValuePCR(int *label_value_sp, int *label_value_pcr, int *label_squence_length, int *cum_labels_length,
|
||||
int *max_labels_length, int batch, cudaStream_t stream) {
|
||||
GenLabelValuePCRKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(label_value_sp, label_value_pcr,
|
||||
label_squence_length, cum_labels_length, batch);
|
||||
UpdateLengthKernel<<<1, 1, 0, stream>>>(label_squence_length, cum_labels_length, max_labels_length, batch);
|
||||
}
|
||||
|
||||
__global__ void GenLabelValueKernel(int *label_value_sp, const int64_t *label_indices, const int *label_values,
|
||||
int *label_squence_length, int *cum_labels_length, int size) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size; i += blockDim.x * gridDim.x) {
|
||||
int64_t b = label_indices[i * 2];
|
||||
int offset = 0;
|
||||
if (b > 0) {
|
||||
offset = cum_labels_length[b - 1];
|
||||
}
|
||||
int64_t index = offset + label_indices[i * 2 + 1];
|
||||
label_value_sp[index] = label_values[i];
|
||||
}
|
||||
}
|
||||
__global__ void LabelValueInitKernel(int *label_value_sp, int size, int blank) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size; i += blockDim.x * gridDim.x) {
|
||||
label_value_sp[i] = blank;
|
||||
}
|
||||
}
|
||||
__global__ void RecalculateLengthKernel(int *label_value_sp, int *label_squence_length, int *cum_labels_length,
|
||||
int batch, int blank) {
|
||||
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
|
||||
int offset = 0;
|
||||
if (i > 0) {
|
||||
offset = cum_labels_length[i - 1];
|
||||
}
|
||||
int L = label_squence_length[i];
|
||||
label_squence_length[i] = 0;
|
||||
for (int j = offset; j < offset + L; ++j) {
|
||||
if (label_value_sp[j] >= blank) {
|
||||
break;
|
||||
} else {
|
||||
label_squence_length[i]++;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
void GenLabelValue(int *label_value_sp, const int64_t *label_indices, const int *label_values,
|
||||
int *label_squence_length, int *cum_labels_length, int *max_labels_length, int size, int blank,
|
||||
int batch, cudaStream_t stream) {
|
||||
LabelValueInitKernel<<<GET_BLOCKS(size), GET_THREADS, 0, stream>>>(label_value_sp, size, blank);
|
||||
GenLabelValueKernel<<<GET_BLOCKS(size), GET_THREADS, 0, stream>>>(label_value_sp, label_indices, label_values,
|
||||
label_squence_length, cum_labels_length, size);
|
||||
RecalculateLengthKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(label_value_sp, label_squence_length,
|
||||
cum_labels_length, batch, blank);
|
||||
UpdateLengthKernel<<<1, 1, 0, stream>>>(label_squence_length, cum_labels_length, max_labels_length, batch);
|
||||
}
|
||||
|
||||
__global__ void CalculatePreLengthKernel(int *label_squence_length, int *precum_labels_length, int *cum_labels_length,
|
||||
int *max_labels_length, const int64_t *label_indices, int batch, int size) {
|
||||
max_labels_length[0] = 0;
|
||||
for (int i = 0; i < size; ++i) {
|
||||
label_squence_length[label_indices[i * 2]]++;
|
||||
if (max_labels_length[0] < label_indices[i * 2]) {
|
||||
max_labels_length[0] = label_indices[i * 2];
|
||||
}
|
||||
}
|
||||
precum_labels_length[0] = label_squence_length[0];
|
||||
cum_labels_length[0] = label_squence_length[0];
|
||||
for (int i = 1; i < batch; ++i) {
|
||||
cum_labels_length[i] = cum_labels_length[i - 1] + label_squence_length[i];
|
||||
precum_labels_length[i] = precum_labels_length[i - 1] + label_squence_length[i];
|
||||
}
|
||||
}
|
||||
|
||||
__global__ void CalculateMaxSequenceKernel(const int *sequence_length, int *max_labels_length, int batch) {
|
||||
max_labels_length[0] = 0;
|
||||
for (int i = 0; i < batch; ++i) {
|
||||
if (sequence_length[i] > max_labels_length[0]) {
|
||||
max_labels_length[0] = sequence_length[i];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void CalculateMaxSequence(const int *sequence_length, int *max_labels_length, int batch, cudaStream_t stream) {
|
||||
CalculateMaxSequenceKernel<<<1, 1, 0, stream>>>(sequence_length, max_labels_length, batch);
|
||||
}
|
||||
|
||||
void CalculatePreLength(int *label_squence_length, int *precum_labels_length, int *cum_labels_length,
|
||||
int *max_labels_length, const int64_t *label_indices, int batch, int size,
|
||||
cudaStream_t stream) {
|
||||
CalculatePreLengthKernel<<<1, 1, 0, stream>>>(label_squence_length, precum_labels_length, cum_labels_length,
|
||||
max_labels_length, label_indices, batch, size);
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void CTCLoss(T *log_alpha_b, T *log_beta_b, T *softmax_probs, int *label_value_with_blank, int batch, int SOffSet,
|
||||
int maxtime, int numclass, const int *sequence_length, int *label_squence_length, int *cum_labels_length,
|
||||
T *cost, T *grads, T *prob_num, bool ignore_longer_outputs_than_inputs, cudaStream_t stream) {
|
||||
ProbInitKernel<<<GET_BLOCKS(maxtime * batch * numclass), GET_THREADS, 0, stream>>>(prob_num,
|
||||
maxtime * batch * numclass);
|
||||
CTCLossKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(
|
||||
log_alpha_b, log_beta_b, softmax_probs, label_value_with_blank, batch, SOffSet, maxtime, numclass, sequence_length,
|
||||
label_squence_length, cum_labels_length, cost, grads, prob_num, ignore_longer_outputs_than_inputs);
|
||||
}
|
||||
|
||||
template void CalculateFwdVar<float>(float *log_alpha_b, int *label_value_with_blank, float *softmax_probs,
|
||||
const int *sequence_length, bool ctc_merge_repeated, int batch, int SOffSet,
|
||||
int maxtime, int blank, int *label_squence_length, int *cum_labels_length,
|
||||
bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
|
||||
|
||||
template void CalculateBwdVar<float>(float *log_beta_b, int *label_value_with_blank, float *softmax_probs,
|
||||
const int *sequence_length, bool ctc_merge_repeated, int batch, int SOffSet,
|
||||
int maxtime, int blank, int *label_squence_length, int *cum_labels_length,
|
||||
bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
|
||||
|
||||
template void InnerSoftMax<float>(const float *probs, float *softmax_probs, const int *sequence_length, int max_time,
|
||||
int batch, int numclass, cudaStream_t stream);
|
||||
|
||||
template void CTCLoss<float>(float *log_alpha_b, float *log_beta_b, float *softmax_probs, int *label_value_with_blank,
|
||||
int batch, int SOffSet, int maxtime, int numclass, const int *sequence_length,
|
||||
int *label_squence_length, int *cum_labels_length, float *cost, float *grads,
|
||||
float *prob_num, bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
|
|
@ -0,0 +1,51 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_IMPL_CUH
|
||||
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_IMPL_CUH
|
||||
|
||||
template <typename T>
|
||||
void CalculateFwdVar(T *log_alpha_b, int *label_value_with_blank, T *softmax_probs, const int *sequence_length,
|
||||
bool ctc_merge_repeated, int batch, int SOffSet, int maxtime, int blank, int *label_squence_length,
|
||||
int *cum_labels_length, bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
|
||||
|
||||
template <typename T>
|
||||
void CalculateBwdVar(T *log_beta_b, int *label_value_with_blank, T *softmax_probs, const int *sequence_length,
|
||||
bool ctc_merge_repeated, int batch, int SOffSet, int maxtime, int blank, int *label_squence_length,
|
||||
int *cum_labels_length, bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
|
||||
|
||||
template <typename T>
|
||||
void InnerSoftMax(const T *probs, T *softmax_cost, const int *sequence_length, int max_time, int batch, int numclass,
|
||||
cudaStream_t stream);
|
||||
|
||||
void GenLabelValuePCR(int *label_value_sp, int *label_value_pcr, int *label_squence_length, int *cum_labels_length,
|
||||
int *max_labels_length, int batch, cudaStream_t stream);
|
||||
|
||||
void GenLabelWithBlank(int *label_value, int *label_value_with_blank, int *label_squence_length,
|
||||
int *precum_labels_length, int *cum_labels_length, int batch, int blank, cudaStream_t stream);
|
||||
|
||||
void GenLabelValue(int *label_value_sp, const int64_t *label_indices, const int *label_values,
|
||||
int *label_squence_length, int *cum_labels_length, int *max_labels_length, int size, int blank,
|
||||
int batch, cudaStream_t stream);
|
||||
|
||||
void CalculatePreLength(int *label_squence_length, int *precum_labels_length, int *cum_labels_length,
|
||||
int *max_labels_length, const int64_t *label_indices, int batch, int size, cudaStream_t stream);
|
||||
void CalculateMaxSequence(const int *sequence_length, int *max_labels_length, int batch, cudaStream_t stream);
|
||||
template <typename T>
|
||||
void CTCLoss(T *log_alpha_b, T *log_beta_b, T *softmax_probs, int *label_value_with_blank, int batch, int SOffSet,
|
||||
int maxtime, int numclass, const int *sequence_length, int *label_squence_length, int *cum_labels_length,
|
||||
T *cost, T *grads, T *prob_num, bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
|
||||
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_IMPL_CUH
|
|
@ -44,8 +44,12 @@ MS_REG_GPU_KERNEL_ONE(Square, KernelAttr().AddInputAttr(kNumberTypeFloat16).AddO
|
|||
UnaryOpGpuKernel, half)
|
||||
MS_REG_GPU_KERNEL_ONE(Sqrt, KernelAttr().AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
|
||||
UnaryOpGpuKernel, float)
|
||||
MS_REG_GPU_KERNEL_ONE(Sqrt, KernelAttr().AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),
|
||||
UnaryOpGpuKernel, half)
|
||||
MS_REG_GPU_KERNEL_ONE(Rsqrt, KernelAttr().AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
|
||||
UnaryOpGpuKernel, float)
|
||||
MS_REG_GPU_KERNEL_ONE(Rsqrt, KernelAttr().AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),
|
||||
UnaryOpGpuKernel, half)
|
||||
MS_REG_GPU_KERNEL_ONE(Sin, KernelAttr().AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
|
||||
UnaryOpGpuKernel, float)
|
||||
MS_REG_GPU_KERNEL_ONE(Sin, KernelAttr().AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),
|
||||
|
|
|
@ -18,10 +18,10 @@
|
|||
|
||||
namespace mindspore {
|
||||
namespace kernel {
|
||||
MS_REG_GPU_KERNEL_ONE(CTCLossV2,
|
||||
MS_REG_GPU_KERNEL_ONE(CTCLoss,
|
||||
KernelAttr()
|
||||
.AddInputAttr(kNumberTypeFloat32)
|
||||
.AddInputAttr(kNumberTypeInt32)
|
||||
.AddInputAttr(kNumberTypeInt64)
|
||||
.AddInputAttr(kNumberTypeInt32)
|
||||
.AddInputAttr(kNumberTypeInt32)
|
||||
.AddOutputAttr(kNumberTypeFloat32)
|
||||
|
|
|
@ -19,64 +19,111 @@
|
|||
|
||||
#include <cuda_runtime_api.h>
|
||||
#include <vector>
|
||||
#include <limits>
|
||||
#include "backend/kernel_compiler/gpu/gpu_kernel.h"
|
||||
#include "backend/kernel_compiler/gpu/gpu_kernel_factory.h"
|
||||
#include "runtime/device/gpu/gpu_memory_allocator.h"
|
||||
|
||||
#include "backend/kernel_compiler/gpu/cuda_impl/ctcloss_impl.cuh"
|
||||
namespace mindspore {
|
||||
namespace kernel {
|
||||
template <typename T>
|
||||
class CtcLossGpuKernel : public GpuKernel {
|
||||
public:
|
||||
CtcLossGpuKernel()
|
||||
: cudnn_handle_(nullptr),
|
||||
probs_desc_(nullptr),
|
||||
ctcloss_desc_(nullptr),
|
||||
: label_indice_size_(0),
|
||||
label_size_(0),
|
||||
input_lengths_size_(0),
|
||||
label_lengths_size_(0) {}
|
||||
~CtcLossGpuKernel() override { DestroyResource(); }
|
||||
squence_lengths_size_(0),
|
||||
preprocess_collapse_repeated_(false),
|
||||
ctc_merge_repeated_(true),
|
||||
ignore_longer_outputs_than_inputs_(false) {}
|
||||
~CtcLossGpuKernel() override = default;
|
||||
|
||||
const std::vector<size_t> &GetInputSizeList() const override { return input_size_list_; }
|
||||
const std::vector<size_t> &GetOutputSizeList() const override { return output_size_list_; }
|
||||
const std::vector<size_t> &GetWorkspaceSizeList() const override { return workspace_size_list_; }
|
||||
|
||||
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &,
|
||||
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
|
||||
const std::vector<AddressPtr> &outputs, void *stream_ptr) override {
|
||||
float *probs = GetDeviceAddress<float>(inputs, 0);
|
||||
float *costs = GetDeviceAddress<float>(outputs, 0);
|
||||
float *grads = GetDeviceAddress<float>(outputs, 1);
|
||||
|
||||
// Copy labels/input_lengths/label_length to host as cudnn7.x.x requires
|
||||
int *labels_host = nullptr;
|
||||
int *no_blank_labels_host = nullptr;
|
||||
void *input_lengths_host = nullptr;
|
||||
void *label_lengths_host = nullptr;
|
||||
cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
|
||||
AllocHostMem(&labels_host, &no_blank_labels_host, &input_lengths_host, &label_lengths_host, inputs);
|
||||
CopyToHostSync(labels_host, no_blank_labels_host, input_lengths_host, label_lengths_host, inputs, stream);
|
||||
const T *probs = GetDeviceAddress<T>(inputs, 0);
|
||||
const int64_t *label_indices = GetDeviceAddress<int64_t>(inputs, 1);
|
||||
const int *label_values = GetDeviceAddress<int>(inputs, 2);
|
||||
const int *sequence_length = GetDeviceAddress<int>(inputs, 3);
|
||||
T *costs = GetDeviceAddress<T>(outputs, 0);
|
||||
T *grads = GetDeviceAddress<T>(outputs, 1);
|
||||
T *softmax_probs = GetDeviceAddress<T>(workspace, 0);
|
||||
int *cum_labels_length = GetDeviceAddress<int>(workspace, 1);
|
||||
int *label_squence_length = GetDeviceAddress<int>(workspace, 2);
|
||||
int *label_value_sp = GetDeviceAddress<int>(workspace, 3);
|
||||
int *label_value_pcr = GetDeviceAddress<int>(workspace, 4);
|
||||
T *prob_num = GetDeviceAddress<T>(workspace, 5);
|
||||
int *precum_labels_length = GetDeviceAddress<int>(workspace, 6);
|
||||
int *max_labels_length = GetDeviceAddress<int>(workspace, 7);
|
||||
int numclass = SizeToInt(probs_dims_[2]);
|
||||
int batch = SizeToInt(probs_dims_[1]);
|
||||
int max_time = SizeToInt(probs_dims_[0]);
|
||||
int max_sequence = 0;
|
||||
CalculateMaxSequence(sequence_length, max_labels_length, batch, stream);
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(
|
||||
cudaMemcpyAsync(&max_sequence, max_labels_length, sizeof(int), cudaMemcpyDeviceToHost, stream),
|
||||
"cudaMemcpyAsync failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
|
||||
if (max_time < max_sequence) {
|
||||
MS_LOG(EXCEPTION) << "max_time should be greater than sequence length.";
|
||||
}
|
||||
InnerSoftMax(probs, softmax_probs, sequence_length, max_time, batch, numclass, stream);
|
||||
MemsetForWS(label_value_pcr, cum_labels_length, label_squence_length, costs, grads, stream);
|
||||
int max_labels_length_host = 0;
|
||||
int batch_label = 0;
|
||||
int *label_value_with_blank = nullptr;
|
||||
T *log_alpha_b = nullptr;
|
||||
T *log_beta_b = nullptr;
|
||||
CalculatePreLength(label_squence_length, precum_labels_length, cum_labels_length, max_labels_length, label_indices,
|
||||
batch, label_size_ / sizeof(int), stream);
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(
|
||||
cudaMemcpyAsync(&batch_label, max_labels_length, sizeof(int), cudaMemcpyDeviceToHost, stream),
|
||||
"cudaMemcpyAsync failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
|
||||
if (batch != batch_label + 1) {
|
||||
MS_LOG(EXCEPTION) << "label batch should be equal to input batch.";
|
||||
}
|
||||
GenLabelValue(label_value_sp, label_indices, label_values, label_squence_length, cum_labels_length,
|
||||
max_labels_length, label_size_ / sizeof(int), numclass - 1, batch, stream);
|
||||
if (preprocess_collapse_repeated_) {
|
||||
GenLabelValuePCR(label_value_sp, label_value_pcr, label_squence_length, cum_labels_length, max_labels_length,
|
||||
batch, stream);
|
||||
}
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(
|
||||
cudaMemcpyAsync(&max_labels_length_host, max_labels_length, sizeof(int), cudaMemcpyDeviceToHost, stream),
|
||||
"cudaMemcpyAsync failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
|
||||
int SOffSet = 2 * max_labels_length_host + 1;
|
||||
int log_prob_size = batch * SOffSet * max_time;
|
||||
if (!ignore_longer_outputs_than_inputs_ && max_labels_length_host > max_time) {
|
||||
MS_LOG(EXCEPTION) << "output size is greater than input size.";
|
||||
}
|
||||
MemManageForCus(&log_alpha_b, &log_beta_b, &label_value_with_blank, cum_labels_length, log_prob_size, batch,
|
||||
stream);
|
||||
|
||||
size_t workspace_size = 0;
|
||||
CHECK_CUDNN_RET_WITH_EXCEPT(
|
||||
cudnnGetCTCLossWorkspaceSize(
|
||||
cudnn_handle_, probs_desc_, probs_desc_, reinterpret_cast<int *>(no_blank_labels_host),
|
||||
reinterpret_cast<int *>(label_lengths_host), reinterpret_cast<int *>(input_lengths_host),
|
||||
CUDNN_CTC_LOSS_ALGO_DETERMINISTIC, ctcloss_desc_, &workspace_size),
|
||||
"cudnnGetCTCLossWorkspaceSize failed.");
|
||||
void *workspace = device::gpu::GPUMemoryAllocator::GetInstance().AllocTensorMem(workspace_size);
|
||||
if (workspace == nullptr) {
|
||||
MS_LOG(EXCEPTION) << "Failed to alloc workspace, size: " << workspace_size;
|
||||
if (preprocess_collapse_repeated_) {
|
||||
GenLabelWithBlank(label_value_pcr, label_value_with_blank, label_squence_length, precum_labels_length,
|
||||
cum_labels_length, batch, numclass - 1, stream);
|
||||
} else {
|
||||
GenLabelWithBlank(label_value_sp, label_value_with_blank, label_squence_length, precum_labels_length,
|
||||
cum_labels_length, batch, numclass - 1, stream);
|
||||
}
|
||||
|
||||
CHECK_CUDNN_RET_WITH_EXCEPT(
|
||||
cudnnCTCLoss(cudnn_handle_, probs_desc_, probs, reinterpret_cast<int *>(no_blank_labels_host),
|
||||
reinterpret_cast<int *>(label_lengths_host), reinterpret_cast<int *>(input_lengths_host), costs,
|
||||
probs_desc_, grads, CUDNN_CTC_LOSS_ALGO_DETERMINISTIC, ctcloss_desc_, workspace, workspace_size),
|
||||
"cudnnCtcLoss failed.");
|
||||
CalculateFwdVar(log_alpha_b, label_value_with_blank, softmax_probs, sequence_length, ctc_merge_repeated_, batch,
|
||||
SOffSet, max_time, numclass - 1, label_squence_length, cum_labels_length,
|
||||
ignore_longer_outputs_than_inputs_, stream);
|
||||
CalculateBwdVar(log_beta_b, label_value_with_blank, softmax_probs, sequence_length, ctc_merge_repeated_, batch,
|
||||
SOffSet, max_time, numclass - 1, label_squence_length, cum_labels_length,
|
||||
ignore_longer_outputs_than_inputs_, stream);
|
||||
CTCLoss(log_alpha_b, log_beta_b, softmax_probs, label_value_with_blank, batch, SOffSet, max_time, numclass,
|
||||
sequence_length, label_squence_length, cum_labels_length, costs, grads, prob_num,
|
||||
ignore_longer_outputs_than_inputs_, stream);
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
|
||||
|
||||
device::gpu::GPUMemoryAllocator::GetInstance().FreeTensorMem(workspace);
|
||||
FreeHostMem(labels_host, no_blank_labels_host, input_lengths_host, label_lengths_host);
|
||||
FreeMem(label_value_with_blank, log_alpha_b, log_beta_b);
|
||||
return true;
|
||||
}
|
||||
bool Init(const CNodePtr &kernel_node) override {
|
||||
|
@ -88,104 +135,98 @@ class CtcLossGpuKernel : public GpuKernel {
|
|||
probs_dims_[0] = probs_shape[0];
|
||||
probs_dims_[1] = probs_shape[1];
|
||||
probs_dims_[2] = probs_shape[2];
|
||||
|
||||
auto labels_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 1);
|
||||
if (labels_dims.size() != 1 && labels_dims.size() != 2) {
|
||||
auto indice_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 1);
|
||||
auto labels_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 2);
|
||||
if (labels_dims.size() != 1) {
|
||||
MS_LOG(EXCEPTION) << "labels dims: " << labels_dims.size() << " not support.";
|
||||
}
|
||||
if (indice_dims.size() != 2) {
|
||||
MS_LOG(EXCEPTION) << "labels indice dims: " << indice_dims.size() << " not support.";
|
||||
}
|
||||
label_size_ = sizeof(int);
|
||||
for (auto i : labels_dims) {
|
||||
label_size_ *= i;
|
||||
}
|
||||
|
||||
auto input_length_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 2);
|
||||
input_lengths_size_ = input_length_dims[0] * sizeof(int);
|
||||
auto label_length_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 3);
|
||||
label_lengths_size_ = label_length_dims[0] * sizeof(int);
|
||||
CHECK_CUDNN_RET_WITH_EXCEPT(
|
||||
cudnnSetTensorNdDescriptorEx(probs_desc_, CUDNN_TENSOR_NCHW, CUDNN_DATA_FLOAT, 3, probs_dims_),
|
||||
"cudnnSetTensorNdDescriptorEx failed.");
|
||||
CHECK_CUDNN_RET_WITH_EXCEPT(cudnnSetCTCLossDescriptorEx(ctcloss_desc_, CUDNN_DATA_FLOAT,
|
||||
CUDNN_LOSS_NORMALIZATION_SOFTMAX, CUDNN_PROPAGATE_NAN),
|
||||
"cudnnSetCTCLossDescriptorEx failed.");
|
||||
label_indice_size_ = sizeof(int64_t);
|
||||
for (auto i : indice_dims) {
|
||||
label_indice_size_ *= i;
|
||||
}
|
||||
auto squence_length_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 3);
|
||||
squence_lengths_size_ = squence_length_dims[0] * sizeof(int);
|
||||
preprocess_collapse_repeated_ = GetAttr<bool>(kernel_node, "preprocess_collapse_repeated");
|
||||
ctc_merge_repeated_ = GetAttr<bool>(kernel_node, "ctc_merge_repeated");
|
||||
ignore_longer_outputs_than_inputs_ = GetAttr<bool>(kernel_node, "ignore_longer_outputs_than_inputs");
|
||||
InitSizeLists();
|
||||
return true;
|
||||
}
|
||||
|
||||
protected:
|
||||
void InitResource() override {
|
||||
cudnn_handle_ = device::gpu::GPUDeviceManager::GetInstance().GetCudnnHandle();
|
||||
CHECK_CUDNN_RET_WITH_EXCEPT(cudnnCreateTensorDescriptor(&probs_desc_), "cudnnCreateTensorDescriptor failed.");
|
||||
CHECK_CUDNN_RET_WITH_EXCEPT(cudnnCreateCTCLossDescriptor(&ctcloss_desc_), "cudnnCreateCTCLossDescriptor failed.");
|
||||
}
|
||||
|
||||
void InitSizeLists() override {
|
||||
input_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(float));
|
||||
input_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T));
|
||||
input_size_list_.push_back(label_indice_size_);
|
||||
input_size_list_.push_back(label_size_);
|
||||
input_size_list_.push_back(input_lengths_size_);
|
||||
input_size_list_.push_back(label_lengths_size_);
|
||||
|
||||
output_size_list_.push_back(probs_dims_[1] * sizeof(float));
|
||||
output_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(float));
|
||||
input_size_list_.push_back(squence_lengths_size_);
|
||||
workspace_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T));
|
||||
workspace_size_list_.push_back(squence_lengths_size_);
|
||||
workspace_size_list_.push_back(squence_lengths_size_);
|
||||
workspace_size_list_.push_back(label_size_);
|
||||
workspace_size_list_.push_back(label_size_);
|
||||
workspace_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T));
|
||||
workspace_size_list_.push_back(squence_lengths_size_);
|
||||
workspace_size_list_.push_back(sizeof(int));
|
||||
output_size_list_.push_back(probs_dims_[1] * sizeof(T));
|
||||
output_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T));
|
||||
}
|
||||
|
||||
private:
|
||||
void DestroyResource() noexcept {
|
||||
CHECK_CUDNN_RET_WITH_ERROR(cudnnDestroyCTCLossDescriptor(ctcloss_desc_), "cudnnDestroyCTCLossDescriptor failed.");
|
||||
CHECK_CUDNN_RET_WITH_ERROR(cudnnDestroyTensorDescriptor(probs_desc_), "cudnnDestroyTensorDescriptor failed.");
|
||||
}
|
||||
|
||||
void AllocHostMem(int **labels_host, int **no_blank_labels_host, void **input_lengths_host, void **label_lengths_host,
|
||||
const std::vector<AddressPtr> &inputs) {
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMallocHost(labels_host, inputs[1]->size), "cudaMallocHost failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMallocHost(no_blank_labels_host, inputs[1]->size), "cudaMallocHost failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMallocHost(input_lengths_host, inputs[2]->size), "cudaMallocHost failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMallocHost(label_lengths_host, inputs[3]->size), "cudaMallocHost failed.");
|
||||
}
|
||||
|
||||
void FreeHostMem(int *labels_host, int *no_blank_labels_host, void *input_lengths_host, void *label_lengths_host) {
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaFreeHost(label_lengths_host), "cudaFreeHost failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaFreeHost(input_lengths_host), "cudaFreeHost failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaFreeHost(labels_host), "cudaFreeHost failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaFreeHost(no_blank_labels_host), "cudaFreeHost failed.");
|
||||
}
|
||||
|
||||
void CopyToHostSync(int *labels_host, int *no_blank_labels_host, void *input_lengths_host, void *label_lengths_host,
|
||||
const std::vector<AddressPtr> &inputs, cudaStream_t stream) {
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
|
||||
void MemsetForWS(int *label_value_pcr, int *cum_labels_length, int *label_squence_length, T *costs, T *grads,
|
||||
cudaStream_t stream) {
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMemsetAsync(label_value_pcr, static_cast<int>(0), label_size_, stream),
|
||||
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMemsetAsync(cum_labels_length, static_cast<int>(0), squence_lengths_size_, stream),
|
||||
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(
|
||||
cudaMemcpyAsync(labels_host, inputs[1]->addr, inputs[1]->size, cudaMemcpyDeviceToHost, stream),
|
||||
"cudaMemcpyAsync failed.");
|
||||
cudaMemsetAsync(label_squence_length, static_cast<int>(0), squence_lengths_size_, stream),
|
||||
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMemsetAsync(costs, static_cast<T>(0), probs_dims_[1] * sizeof(T), stream),
|
||||
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(
|
||||
cudaMemcpyAsync(input_lengths_host, inputs[2]->addr, inputs[2]->size, cudaMemcpyDeviceToHost, stream),
|
||||
"cudaMemcpyAsync failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(
|
||||
cudaMemcpyAsync(label_lengths_host, inputs[3]->addr, inputs[3]->size, cudaMemcpyDeviceToHost, stream),
|
||||
cudaMemsetAsync(grads, static_cast<T>(0), probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T), stream),
|
||||
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
|
||||
}
|
||||
void MemManageForCus(T **log_alpha_b, T **log_beta_b, int **label_value_with_blank, int *cum_labels_length,
|
||||
int log_prob_size, int batch, cudaStream_t stream) {
|
||||
int total_labels_size_host = 0;
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMalloc(reinterpret_cast<void **>(log_alpha_b), sizeof(T) * log_prob_size),
|
||||
"cudaMalloc failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMalloc(reinterpret_cast<void **>(log_beta_b), sizeof(T) * log_prob_size),
|
||||
"cudaMalloc failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaMemcpyAsync(&total_labels_size_host, cum_labels_length + batch - 1, sizeof(int),
|
||||
cudaMemcpyDeviceToHost, stream),
|
||||
"cudaMemcpyAsync failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(
|
||||
cudaMalloc(reinterpret_cast<void **>(label_value_with_blank), sizeof(int) * (2 * total_labels_size_host + batch)),
|
||||
"cudaMalloc failed.");
|
||||
}
|
||||
|
||||
// remove blank element
|
||||
size_t j = 0;
|
||||
for (size_t i = 0; i < inputs[1]->size / sizeof(int); i++) {
|
||||
if (labels_host[i] != 0) {
|
||||
no_blank_labels_host[j] = labels_host[i];
|
||||
j++;
|
||||
}
|
||||
}
|
||||
void FreeMem(int *label_value_with_blank, T *log_alpha_b, T *log_beta_b) {
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaFree(label_value_with_blank), "cudaFree failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaFree(log_alpha_b), "cudaFree failed.");
|
||||
CHECK_CUDA_RET_WITH_EXCEPT(cudaFree(log_beta_b), "cudaFree failed.");
|
||||
}
|
||||
|
||||
std::vector<size_t> input_size_list_;
|
||||
std::vector<size_t> output_size_list_;
|
||||
std::vector<size_t> workspace_size_list_;
|
||||
|
||||
cudnnHandle_t cudnn_handle_;
|
||||
cudnnTensorDescriptor_t probs_desc_;
|
||||
cudnnCTCLossDescriptor_t ctcloss_desc_;
|
||||
int probs_dims_[3] = {0};
|
||||
size_t probs_dims_[3] = {0};
|
||||
int label_indice_size_;
|
||||
int label_size_;
|
||||
int input_lengths_size_;
|
||||
int label_lengths_size_;
|
||||
};
|
||||
int squence_lengths_size_;
|
||||
bool preprocess_collapse_repeated_;
|
||||
bool ctc_merge_repeated_;
|
||||
bool ignore_longer_outputs_than_inputs_;
|
||||
T kLogZero_ = -std::numeric_limits<T>::infinity();
|
||||
}; // namespace kernel
|
||||
} // namespace kernel
|
||||
} // namespace mindspore
|
||||
|
||||
|
|
|
@ -66,7 +66,6 @@ class MirrorPadGpuFwdKernel : public GpuKernel {
|
|||
}
|
||||
|
||||
string mode = GetValue<string>(AnfAlgo::GetCNodePrimitive(kernel_node)->GetAttr("mode"));
|
||||
|
||||
if (mode == "REFLECT") {
|
||||
mode_ = 0; // reflected mirroring
|
||||
} else {
|
||||
|
|
|
@ -66,7 +66,6 @@ class MirrorPadGpuBackKernel : public GpuKernel {
|
|||
}
|
||||
|
||||
string mode = GetValue<string>(AnfAlgo::GetCNodePrimitive(kernel_node)->GetAttr("mode"));
|
||||
|
||||
if (mode == "REFLECT") {
|
||||
mode_ = 0; // reflected mirroring
|
||||
} else {
|
||||
|
|
|
@ -27,6 +27,5 @@ MS_REG_GPU_KERNEL_ONE(
|
|||
ROIAlign,
|
||||
KernelAttr().AddInputAttr(kNumberTypeFloat16).AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),
|
||||
ROIAlignGpuFwdKernel, half)
|
||||
|
||||
} // namespace kernel
|
||||
} // namespace mindspore
|
||||
|
|
|
@ -27,6 +27,5 @@ MS_REG_GPU_KERNEL_ONE(
|
|||
ROIAlignGrad,
|
||||
KernelAttr().AddInputAttr(kNumberTypeFloat16).AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),
|
||||
ROIAlignGradGpuFwdKernel, half)
|
||||
|
||||
} // namespace kernel
|
||||
} // namespace mindspore
|
||||
|
|
|
@ -87,6 +87,7 @@ static std::map<string, string> tbe_func_adapter_map = {
|
|||
{"apply_centered_rms_prop", "apply_centered_rms_prop_d"},
|
||||
{"transpose", "transpose_d"},
|
||||
{"fill", "fill_d"},
|
||||
{"i_fmr", "ifmr"},
|
||||
{"unsorted_segment_sum", "unsorted_segment_sum_d"},
|
||||
{"unsorted_segment_prod", "unsorted_segment_prod_d"},
|
||||
{"concat", "concat_d"},
|
||||
|
@ -139,7 +140,8 @@ static std::map<string, string> tbe_func_adapter_map = {
|
|||
{"inplace_update", "inplace_update_d"},
|
||||
{"matrix_diag", "matrix_diag_d"},
|
||||
{"matrix_diag_part", "matrix_diag_part_d"},
|
||||
{"matrix_set_diag", "matrix_set_diag_d"}};
|
||||
{"matrix_set_diag", "matrix_set_diag_d"},
|
||||
{"l_stm_input_grad", "lstm_input_grad"}};
|
||||
|
||||
void TbeAdapter::NormalizeFuncName(std::string *func_name) {
|
||||
if (func_name == nullptr) {
|
||||
|
|
|
@ -25,6 +25,7 @@
|
|||
#include "backend/kernel_compiler/tbe/tbe_convert_utils.h"
|
||||
#include "backend/kernel_compiler/tbe/tbe_utils.h"
|
||||
#include "utils/ms_context.h"
|
||||
#include "runtime/dev.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace kernel {
|
||||
|
@ -86,6 +87,8 @@ constexpr auto kJPyModulePath = "py_module_path";
|
|||
constexpr auto kJPreBuildOutsAttrs = "prebuild_outs_attrs";
|
||||
constexpr auto kJKwdArgs = "kwds_args";
|
||||
constexpr auto kJListArgs = "list_args";
|
||||
constexpr auto kJSocVersion = "socVersion";
|
||||
constexpr auto kSOC_VERSION = "SOC_VERSION";
|
||||
|
||||
bool TbeKernelJsonCreator::GenTbeSingleKernelJson(const std::shared_ptr<mindspore::AnfNode> &anf_node,
|
||||
nlohmann::json *kernel_json) {
|
||||
|
@ -122,6 +125,8 @@ bool TbeKernelJsonCreator::GenTbeSingleKernelJson(const std::shared_ptr<mindspor
|
|||
nlohmann::json attrs_json;
|
||||
(void)GenTbeAttrJson(anf_node, op_info_ptr, &attrs_json);
|
||||
op_info_json[kJAttrs] = attrs_json;
|
||||
auto soc_version = TbeKernelJsonCreator::GetSocVersion();
|
||||
op_info_json[kJSocVersion] = soc_version;
|
||||
std::string json_str = op_info_json.dump();
|
||||
size_t hash_id = std::hash<std::string>()(json_str);
|
||||
auto context_ptr = MsContext::GetInstance();
|
||||
|
@ -150,7 +155,13 @@ bool TbeKernelJsonCreator::GenInputDescJson(const std::shared_ptr<AnfNode> &anf_
|
|||
MS_EXCEPTION_IF_NULL(input_ptr);
|
||||
MS_EXCEPTION_IF_NULL(input_list);
|
||||
std::string op_name = AnfAlgo::GetCNodeName(anf_node);
|
||||
if (input_ptr->name() == "input_indices" && op_name == kTopKOpName) {
|
||||
if (op_name == kDynamicRNNOpName && input_ptr->name() == "seq_length") {
|
||||
nlohmann::json input_desc_json;
|
||||
auto in_name = input_ptr->name();
|
||||
input_desc_json[kJName] = in_name + std::to_string(input_i);
|
||||
input_desc_json[kJValid] = false;
|
||||
input_list->emplace_back(input_desc_json);
|
||||
} else if (input_ptr->name() == "input_indices" && op_name == kTopKOpName) {
|
||||
TbeAdapter::GenTopKV2IndicesTensorInfo(anf_node, real_input_index, input_list, creater_type_);
|
||||
} else {
|
||||
auto dtype = GetDeviceInputType(anf_node, real_input_index);
|
||||
|
@ -408,6 +419,30 @@ bool TbeKernelJsonCreator::GenTbeAttrJson(const std::shared_ptr<AnfNode> &anf_no
|
|||
return true;
|
||||
}
|
||||
|
||||
string TbeKernelJsonCreator::GetSocVersion() {
|
||||
// Get default soc version.
|
||||
const int kSocVersionLen = 50;
|
||||
char soc_version[kSocVersionLen] = {0};
|
||||
auto ret = rtGetSocVersion(soc_version, kSocVersionLen);
|
||||
if (ret != RT_ERROR_NONE) {
|
||||
MS_LOG(EXCEPTION) << "GetSocVersion failed.";
|
||||
}
|
||||
MS_LOG(INFO) << "Default SocVersion is " << soc_version;
|
||||
// Get soc version from env value.
|
||||
const char *soc_version_env = getenv(kSOC_VERSION);
|
||||
if (soc_version_env != nullptr) {
|
||||
if (std::strcmp(soc_version, soc_version_env) != 0) {
|
||||
MS_LOG(WARNING) << "SocVerison change to " << soc_version_env;
|
||||
ret = rtSetSocVersion(soc_version_env);
|
||||
if (ret != RT_ERROR_NONE) {
|
||||
MS_LOG(EXCEPTION) << "SetSocVersion to " << soc_version_env << " failed, errorno: " << ret;
|
||||
}
|
||||
return soc_version_env;
|
||||
}
|
||||
}
|
||||
return soc_version;
|
||||
}
|
||||
|
||||
void TbeKernelJsonCreator::ParseAttrValue(const std::string &type, const mindspore::ValuePtr &value,
|
||||
nlohmann::json *attr_obj) {
|
||||
MS_EXCEPTION_IF_NULL(value);
|
||||
|
@ -624,6 +659,8 @@ bool TbeKernelBuild::GenFusionScopeJson(const std::vector<mindspore::AnfNodePtr>
|
|||
index = 0;
|
||||
data_list.insert(data_list.end(), compute_list.begin(), compute_list.end());
|
||||
(*fusion_json)[kFusionOpList] = data_list;
|
||||
auto soc_version = TbeKernelJsonCreator::GetSocVersion();
|
||||
(*fusion_json)[kJSocVersion] = soc_version;
|
||||
return true;
|
||||
}
|
||||
|
||||
|
@ -853,6 +890,7 @@ bool TbeKernelBuild::GenFusionDataInputJson(const std::shared_ptr<mindspore::Anf
|
|||
(*data_str)[kJName] = name;
|
||||
nlohmann::json output_desc;
|
||||
output_desc[kJName] = name;
|
||||
output_desc[kJDataType] = 0;
|
||||
output_desc[kJShape] = "NULL";
|
||||
output_desc_list.push_back(output_desc);
|
||||
(*index)++;
|
||||
|
@ -985,6 +1023,7 @@ bool TbeKernelBuild::GenFusionComputeInputJson(const mindspore::CNodePtr &cnode,
|
|||
for (size_t i = 0; i < optional_num; ++i) {
|
||||
nlohmann::json optional_input_desc;
|
||||
optional_input_desc[kJName] = std::string(kOptional) + std::to_string(*index);
|
||||
optional_input_desc[kJShape] = "NULL";
|
||||
(*index)++;
|
||||
(*layer_iter)->emplace_back(nullptr);
|
||||
input_desc_list_tmp.emplace_back(optional_input_desc);
|
||||
|
|
|
@ -92,6 +92,7 @@ class TbeKernelJsonCreator {
|
|||
std::string json_name() { return json_name_; }
|
||||
bool GenTbeAttrJson(const std::shared_ptr<AnfNode> &anf_node, const std::shared_ptr<OpInfo> &op_info,
|
||||
nlohmann::json *attrs_json);
|
||||
static string GetSocVersion();
|
||||
|
||||
private:
|
||||
bool GenTbeInputsJson(const std::shared_ptr<AnfNode> &anf_node, const std::shared_ptr<OpInfo> &op_info,
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
#include <memory>
|
||||
#include <string>
|
||||
#include "backend/optimizer/common/optimizer.h"
|
||||
#include "backend/optimizer/ascend/ir_fission/dynamic_rnn_grad_fission.h"
|
||||
#include "backend/optimizer/ascend/ir_fission/bn_split.h"
|
||||
#include "backend/optimizer/ascend/ir_fission/bn_grad_split.h"
|
||||
#include "backend/optimizer/ascend/ir_fission/batch_norm_grad_split.h"
|
||||
|
@ -64,6 +65,7 @@
|
|||
#include "backend/optimizer/ascend/format_type/rectify_do_mask_kernel_info.h"
|
||||
#include "backend/optimizer/ascend/format_type/chang_axis_of_reduce_kernel.h"
|
||||
#include "backend/optimizer/ascend/format_type/split_unsupported_transdata.h"
|
||||
#include "backend/optimizer/ascend/format_type/insert_reshape_for_extract_image_patches_op.h"
|
||||
#include "backend/optimizer/pass/getitem_tuple.h"
|
||||
#include "backend/optimizer/pass/optimize_dependence.h"
|
||||
#include "backend/optimizer/pass/erase_visit_attr.h"
|
||||
|
@ -106,6 +108,7 @@
|
|||
#include "backend/optimizer/ascend/ir_fission/concat_fission.h"
|
||||
#include "backend/optimizer/ascend/ir_fission/pack_fission.h"
|
||||
#include "backend/optimizer/ascend/enhancer/concat_outputs_for_all_gather.h"
|
||||
#include "backend/optimizer/ascend/enhancer/add_placeholder_for_dynamic_rnn.h"
|
||||
#include "utils/ms_context.h"
|
||||
#include "backend/optimizer/graph_kernel/composite_ops_fusion.h"
|
||||
#include "backend/optimizer/graph_kernel/basic_ops_fusion.h"
|
||||
|
@ -231,6 +234,7 @@ void AscendMixPrecision(const std::shared_ptr<session::KernelGraph> &kernel_grap
|
|||
auto optimizer = std::make_shared<GraphOptimizer>();
|
||||
auto mixed_precision_pm = std::make_shared<PassManager>("cast_pm");
|
||||
mixed_precision_pm->AddPass(std::make_shared<InsertCast>());
|
||||
mixed_precision_pm->AddPass(std::make_shared<InsertReshapeForExtractImagePatchesOp>());
|
||||
mixed_precision_pm->AddPass(std::make_shared<GetitemTuple>());
|
||||
mixed_precision_pm->AddPass(std::make_shared<CommonSubexpressionElimination>());
|
||||
mixed_precision_pm->AddPass(std::make_shared<EliminateRedundantOp>());
|
||||
|
@ -276,6 +280,8 @@ void AscendBackendIRFusionOptimization(const std::shared_ptr<session::KernelGrap
|
|||
}
|
||||
ir_fusion_pm->AddPass(std::make_shared<LayerNormGradSplit>());
|
||||
ir_fusion_pm->AddPass(std::make_shared<InsertPadForNMSWithMask>());
|
||||
ir_fusion_pm->AddPass(std::make_shared<InsertPlaceholderForDynamicRNN>());
|
||||
ir_fusion_pm->AddPass(std::make_shared<DynamicRNNGradFission>());
|
||||
AddAscendIRFusionRulesPass(ir_fusion_pm.get());
|
||||
AddAscendIRFusionPass(ir_fusion_pm.get());
|
||||
|
||||
|
|
|
@ -0,0 +1,77 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
#include "backend/optimizer/ascend/enhancer/add_placeholder_for_dynamic_rnn.h"
|
||||
#include <vector>
|
||||
#include <memory>
|
||||
#include "backend/optimizer/common/helper.h"
|
||||
#include "backend/session/anf_runtime_algorithm.h"
|
||||
#include "utils/utils.h"
|
||||
#include "abstract/abstract_value.h"
|
||||
#include "base/core_ops.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace opt {
|
||||
const BaseRef InsertPlaceholderForDynamicRNN::DefinePattern() const {
|
||||
std::shared_ptr<Var> V = std::make_shared<CondVar>(UnVisited);
|
||||
std::shared_ptr<Var> Xs = std::make_shared<SeqVar>();
|
||||
return VectorRef({V, Xs});
|
||||
}
|
||||
|
||||
const AnfNodePtr InsertPlaceholderForDynamicRNN::Process(const FuncGraphPtr &func_graph, const AnfNodePtr &node,
|
||||
const EquivPtr &) const {
|
||||
MS_EXCEPTION_IF_NULL(func_graph);
|
||||
MS_EXCEPTION_IF_NULL(node);
|
||||
auto cnode = node->cast<CNodePtr>();
|
||||
MS_EXCEPTION_IF_NULL(cnode);
|
||||
auto op_name = AnfAlgo::GetCNodeName(cnode);
|
||||
if (op_name != kDynamicRNNOpName) {
|
||||
return nullptr;
|
||||
}
|
||||
AnfAlgo::SetNodeAttr(kAttrVisited, MakeValue(true), node);
|
||||
auto kernel_graph = func_graph->cast<std::shared_ptr<session::KernelGraph>>();
|
||||
MS_EXCEPTION_IF_NULL(kernel_graph);
|
||||
size_t input_num = AnfAlgo::GetInputTensorNum(node);
|
||||
if (input_num == 0) {
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
std::vector<AnfNodePtr> new_inputs = {AnfAlgo::GetCNodePrimitiveNode(cnode)};
|
||||
for (size_t in_idx = 0; in_idx < input_num; in_idx++) {
|
||||
auto input_node = AnfAlgo::GetInputNode(cnode, in_idx);
|
||||
if (in_idx == 3) {
|
||||
auto value = std::make_shared<None>();
|
||||
auto value_node = NewValueNode(value);
|
||||
value_node->set_abstract(std::make_shared<abstract::AbstractNone>());
|
||||
auto new_node = kernel_graph->NewValueNode(value_node);
|
||||
kernel_graph->AddValueNodeToGraph(new_node);
|
||||
new_inputs.push_back(new_node);
|
||||
}
|
||||
new_inputs.push_back(input_node);
|
||||
}
|
||||
|
||||
CNodePtr new_node = nullptr;
|
||||
if (kernel_graph == nullptr) {
|
||||
new_node = std::make_shared<CNode>(*cnode);
|
||||
} else {
|
||||
new_node = kernel_graph->NewCNode(cnode);
|
||||
}
|
||||
MS_EXCEPTION_IF_NULL(new_node);
|
||||
new_node->set_inputs(new_inputs);
|
||||
return new_node;
|
||||
}
|
||||
} // namespace opt
|
||||
} // namespace mindspore
|
|
@ -0,0 +1,37 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#ifndef MINDSPORE_CCSRC_PRE_ACTIVATE_ASCEND_ENHANCER_ADD_PLACEHOLDER_FOR_DYNAMIC_RNN_H
|
||||
#define MINDSPORE_CCSRC_PRE_ACTIVATE_ASCEND_ENHANCER_ADD_PLACEHOLDER_FOR_DYNAMIC_RNN_H
|
||||
|
||||
#include <memory>
|
||||
#include <vector>
|
||||
#include "backend/optimizer/common/optimizer.h"
|
||||
#include "backend/optimizer/ascend/ascend_helper.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace opt {
|
||||
class InsertPlaceholderForDynamicRNN : public PatternProcessPass {
|
||||
public:
|
||||
explicit InsertPlaceholderForDynamicRNN(bool multigraph = true)
|
||||
: PatternProcessPass("add_placeholder_for_dynamic_rnn", multigraph) {}
|
||||
~InsertPlaceholderForDynamicRNN() override = default;
|
||||
const BaseRef DefinePattern() const override;
|
||||
const AnfNodePtr Process(const FuncGraphPtr &, const AnfNodePtr &, const EquivPtr &) const override;
|
||||
};
|
||||
} // namespace opt
|
||||
} // namespace mindspore
|
||||
|
||||
#endif // MINDSPORE_CCSRC_PRE_ACTIVATE_ASCEND_ENHANCER_ADD_PLACEHOLDER_FOR_DYNAMIC_RNN_H
|
|
@ -0,0 +1,65 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
#include "backend/optimizer/ascend/format_type/insert_reshape_for_extract_image_patches_op.h"
|
||||
#include <memory>
|
||||
#include "backend/optimizer/ascend/ascend_helper.h"
|
||||
#include "backend/session/anf_runtime_algorithm.h"
|
||||
#include "utils/utils.h"
|
||||
#include "base/core_ops.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace opt {
|
||||
const BaseRef InsertReshapeForExtractImagePatchesOp::DefinePattern() const {
|
||||
VarPtr Xs = std::make_shared<SeqVar>();
|
||||
return VectorRef({prim::kPrimExtractImagePatches, Xs});
|
||||
}
|
||||
|
||||
const AnfNodePtr InsertReshapeForExtractImagePatchesOp::Process(const FuncGraphPtr &func_graph, const AnfNodePtr &node,
|
||||
const EquivPtr &equiv) const {
|
||||
MS_EXCEPTION_IF_NULL(func_graph);
|
||||
MS_EXCEPTION_IF_NULL(equiv);
|
||||
auto extract = CheckAnfNodeIfCNodeAndInputSize(node, 2);
|
||||
MS_EXCEPTION_IF_NULL(extract);
|
||||
auto in_node = extract->input(1);
|
||||
MS_EXCEPTION_IF_NULL(in_node);
|
||||
auto extract_kernel_build_info = AnfAlgo::GetSelectKernelBuildInfo(extract);
|
||||
auto in_node_kernel_build_info = AnfAlgo::GetSelectKernelBuildInfo(in_node);
|
||||
MS_EXCEPTION_IF_NULL(extract_kernel_build_info);
|
||||
MS_EXCEPTION_IF_NULL(in_node_kernel_build_info);
|
||||
std::vector<AnfNodePtr> reshape_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimReshape->name())),
|
||||
in_node};
|
||||
auto reshape_builder = std::make_shared<kernel::KernelBuildInfo::KernelBuildInfoBuilder>();
|
||||
reshape_builder->SetInputsFormat({kOpFormat_NC1HWC0});
|
||||
reshape_builder->SetOutputsFormat({kOpFormat_NC1HWC0});
|
||||
reshape_builder->SetInputsDeviceType({AnfAlgo::GetOutputDeviceDataType(in_node, 0)});
|
||||
reshape_builder->SetOutputsDeviceType({AnfAlgo::GetOutputDeviceDataType(in_node, 0)});
|
||||
reshape_builder->SetKernelType(in_node_kernel_build_info->kernel_type());
|
||||
reshape_builder->SetFusionType(in_node_kernel_build_info->fusion_type());
|
||||
reshape_builder->SetProcessor(in_node_kernel_build_info->processor());
|
||||
|
||||
auto reshape = func_graph->NewCNode(reshape_inputs);
|
||||
reshape->set_scope(in_node->scope());
|
||||
auto shape_tmp = AnfAlgo::GetOutputInferShape(in_node, 0);
|
||||
AnfAlgo::SetOutputInferTypeAndShape({AnfAlgo::GetOutputDeviceDataType(in_node, 0)},
|
||||
{{shape_tmp[0], shape_tmp[2], shape_tmp[3], shape_tmp[1]}}, reshape.get());
|
||||
AnfAlgo::SetSelectKernelBuildInfo(reshape_builder->Build(), reshape.get());
|
||||
AnfAlgo::SetNodeAttr("nop_op", MakeValue(true), reshape);
|
||||
AnfAlgo::SetNodeInput(extract, reshape, 0);
|
||||
return extract;
|
||||
}
|
||||
} // namespace opt
|
||||
} // namespace mindspore
|
|
@ -0,0 +1,41 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
#ifndef MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_FORMAT_TYPE_INSERT_RESHAPE_FOR_EXTRACT_IMAGE_PATCHES_OP_H
|
||||
#define MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_FORMAT_TYPE_INSERT_RESHAPE_FOR_EXTRACT_IMAGE_PATCHES_OP_H
|
||||
#include <vector>
|
||||
#include <string>
|
||||
#include <utility>
|
||||
#include <memory>
|
||||
#include "ir/anf.h"
|
||||
#include "backend/optimizer/common/pattern_engine.h"
|
||||
#include "backend/optimizer/common/helper.h"
|
||||
#include "backend/optimizer/common/optimizer.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace opt {
|
||||
class InsertReshapeForExtractImagePatchesOp : public PatternProcessPass {
|
||||
public:
|
||||
explicit InsertReshapeForExtractImagePatchesOp(bool multigraph = true)
|
||||
: PatternProcessPass("insert_reshape_for_extract_image_patches_op", multigraph) {}
|
||||
~InsertReshapeForExtractImagePatchesOp() override = default;
|
||||
const BaseRef DefinePattern() const override;
|
||||
const AnfNodePtr Process(const FuncGraphPtr &, const AnfNodePtr &, const EquivPtr &) const override;
|
||||
};
|
||||
} // namespace opt
|
||||
} // namespace mindspore
|
||||
|
||||
#endif // MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_FORMAT_TYPE_INSERT_RESHAPE_FOR_EXTRACT_IMAGE_PATCHES_OP_H
|
|
@ -0,0 +1,250 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#include "backend/optimizer/ascend/ir_fission/dynamic_rnn_grad_fission.h"
|
||||
#include <vector>
|
||||
#include <memory>
|
||||
#include <algorithm>
|
||||
#include "backend/session/anf_runtime_algorithm.h"
|
||||
#include "backend/optimizer/common/helper.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace opt {
|
||||
constexpr size_t kDynamicRNNGradInputNum = 16;
|
||||
constexpr size_t kLSTMInputGradOutputNum = 4;
|
||||
const BaseRef DynamicRNNGradFission::DefinePattern() const {
|
||||
VarPtr Xs = std::make_shared<SeqVar>();
|
||||
return VectorRef({prim::kPrimDynamicRNNGrad, Xs});
|
||||
}
|
||||
|
||||
AnfNodePtr CreateSplitVD(const FuncGraphPtr &graph, const AnfNodePtr &node) {
|
||||
MS_EXCEPTION_IF_NULL(graph);
|
||||
MS_EXCEPTION_IF_NULL(node);
|
||||
// SplitV
|
||||
std::vector<AnfNodePtr> splitvd_input = {NewValueNode(std::make_shared<Primitive>(prim::kPrimSplitV->name())), node};
|
||||
auto split_vd = graph->NewCNode(splitvd_input);
|
||||
MS_EXCEPTION_IF_NULL(split_vd);
|
||||
auto dtypes = {AnfAlgo::GetOutputInferDataType(node, 0), AnfAlgo::GetOutputInferDataType(node, 0)};
|
||||
std::vector<size_t> shape = {AnfAlgo::GetOutputInferShape(node, 0)[0] - 1, AnfAlgo::GetOutputInferShape(node, 0)[1],
|
||||
AnfAlgo::GetOutputInferShape(node, 0)[2]};
|
||||
auto shape2 = {IntToSize(1), AnfAlgo::GetOutputInferShape(node, 0)[1], AnfAlgo::GetOutputInferShape(node, 0)[2]};
|
||||
std::vector<std::vector<size_t>> shapes = {shape, shape2};
|
||||
AnfAlgo::SetOutputInferTypeAndShape(dtypes, shapes, split_vd.get());
|
||||
AnfAlgo::SetNodeAttr("split_dim", MakeValue(0), split_vd);
|
||||
AnfAlgo::SetNodeAttr("num_split", MakeValue(2), split_vd);
|
||||
int tmp = SizeToInt(AnfAlgo::GetOutputInferShape(node, 0)[0]) - 1;
|
||||
AnfAlgo::SetNodeAttr("size_splits", MakeValue(std::vector<int>{tmp, 1}), split_vd);
|
||||
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), split_vd);
|
||||
return split_vd;
|
||||
}
|
||||
|
||||
AnfNodePtr CreateLSTMInputGrad(const FuncGraphPtr &graph, const AnfNodePtr &node) {
|
||||
MS_EXCEPTION_IF_NULL(graph);
|
||||
MS_EXCEPTION_IF_NULL(node);
|
||||
auto cnode = node->cast<CNodePtr>();
|
||||
MS_EXCEPTION_IF_NULL(cnode);
|
||||
const auto &dynamic_rnn_grad_inputs = cnode->inputs();
|
||||
std::vector<AnfNodePtr> lstm_input_grad_inputs = {NewValueNode(std::make_shared<Primitive>(kLSTMInputGradOpName)),
|
||||
dynamic_rnn_grad_inputs[2],
|
||||
dynamic_rnn_grad_inputs[6],
|
||||
dynamic_rnn_grad_inputs[8],
|
||||
dynamic_rnn_grad_inputs[9],
|
||||
dynamic_rnn_grad_inputs[10],
|
||||
dynamic_rnn_grad_inputs[11],
|
||||
dynamic_rnn_grad_inputs[12],
|
||||
dynamic_rnn_grad_inputs[13],
|
||||
dynamic_rnn_grad_inputs[14],
|
||||
dynamic_rnn_grad_inputs[15],
|
||||
dynamic_rnn_grad_inputs[16]};
|
||||
std::vector<AnfNodePtr> ori_outputs;
|
||||
CreateMultipleOutputsOfAnfNode(graph, node, 5, &ori_outputs);
|
||||
auto lstm_op = graph->NewCNode(lstm_input_grad_inputs);
|
||||
MS_EXCEPTION_IF_NULL(lstm_op);
|
||||
auto ori_type = AnfAlgo::GetOutputInferDataType(dynamic_rnn_grad_inputs[8], 0);
|
||||
auto types = {AnfAlgo::GetOutputInferDataType(ori_outputs[2], 0), AnfAlgo::GetOutputInferDataType(ori_outputs[3], 0),
|
||||
AnfAlgo::GetOutputInferDataType(ori_outputs[4], 0), ori_type};
|
||||
std::vector<size_t> ori_shape = {AnfAlgo::GetOutputInferShape(dynamic_rnn_grad_inputs[8], 0)[0],
|
||||
AnfAlgo::GetOutputInferShape(dynamic_rnn_grad_inputs[8], 0)[1],
|
||||
4 * AnfAlgo::GetOutputInferShape(dynamic_rnn_grad_inputs[8], 0)[2]};
|
||||
auto shapes = {AnfAlgo::GetOutputInferShape(ori_outputs[2], 0), AnfAlgo::GetOutputInferShape(ori_outputs[3], 0),
|
||||
AnfAlgo::GetOutputInferShape(ori_outputs[4], 0), ori_shape};
|
||||
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, lstm_op.get());
|
||||
return lstm_op;
|
||||
}
|
||||
|
||||
AnfNodePtr CreateBatchMatMul(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
|
||||
MS_EXCEPTION_IF_NULL(graph);
|
||||
MS_EXCEPTION_IF_NULL(node1);
|
||||
MS_EXCEPTION_IF_NULL(node2);
|
||||
// BatchMatMul
|
||||
std::vector<AnfNodePtr> matmul_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimBatchMatMul->name())),
|
||||
node2, node1};
|
||||
auto batch_matmul = graph->NewCNode(matmul_inputs);
|
||||
MS_EXCEPTION_IF_NULL(batch_matmul);
|
||||
auto types = {AnfAlgo::GetOutputInferDataType(node1, 0)};
|
||||
std::vector<size_t> shape = {AnfAlgo::GetOutputInferShape(node2, 0)[0], AnfAlgo::GetOutputInferShape(node2, 0)[2],
|
||||
AnfAlgo::GetOutputInferShape(node1, 0)[2]};
|
||||
auto shapes = {shape};
|
||||
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), batch_matmul);
|
||||
AnfAlgo::SetNodeAttr("transpose_x1", MakeValue(true), batch_matmul);
|
||||
AnfAlgo::SetNodeAttr("transpose_x2", MakeValue(false), batch_matmul);
|
||||
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, batch_matmul.get());
|
||||
return batch_matmul;
|
||||
}
|
||||
|
||||
AnfNodePtr AddHConcatD(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
|
||||
MS_EXCEPTION_IF_NULL(graph);
|
||||
MS_EXCEPTION_IF_NULL(node1);
|
||||
MS_EXCEPTION_IF_NULL(node2);
|
||||
std::vector<AnfNodePtr> ori_outputs;
|
||||
CreateMultipleOutputsOfAnfNode(graph, node2, 2, &ori_outputs);
|
||||
auto ori_shape = AnfAlgo::GetOutputInferShape(node1, 0);
|
||||
std::vector<std::vector<size_t>> shape_tmp;
|
||||
if (ori_shape.size() == 3) {
|
||||
shape_tmp = {ori_shape};
|
||||
} else {
|
||||
shape_tmp = {{IntToSize(1), ori_shape[0], ori_shape[1]}};
|
||||
}
|
||||
auto ori_dtype = {AnfAlgo::GetOutputInferDataType(node1, 0)};
|
||||
// reshape
|
||||
std::vector<AnfNodePtr> reshape_input = {NewValueNode(std::make_shared<Primitive>(prim::kPrimReshape->name())),
|
||||
node1};
|
||||
auto reshape = graph->NewCNode(reshape_input);
|
||||
AnfAlgo::SetNodeAttr(kAttrVisited, MakeValue(true), reshape);
|
||||
AnfAlgo::SetOutputInferTypeAndShape(ori_dtype, shape_tmp, reshape.get());
|
||||
|
||||
// concatd --> concat
|
||||
std::vector<AnfNodePtr> concat_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimConcat->name())),
|
||||
reshape, ori_outputs[0]};
|
||||
auto concat_op = graph->NewCNode(concat_inputs);
|
||||
MS_EXCEPTION_IF_NULL(concat_op);
|
||||
std::vector<size_t> input = {AnfAlgo::GetOutputInferShape(node2, 0)[0] + 1, AnfAlgo::GetOutputInferShape(node2, 0)[1],
|
||||
AnfAlgo::GetOutputInferShape(node2, 0)[2]};
|
||||
auto types = {AnfAlgo::GetOutputInferDataType(node1, 0)};
|
||||
auto shapes = {input};
|
||||
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, concat_op.get());
|
||||
AnfAlgo::SetNodeAttr(kAttrN, MakeValue(2), concat_op);
|
||||
AnfAlgo::SetNodeAttr(kAttrDynInputSizes, MakeValue(std::vector<int>{2}), concat_op);
|
||||
AnfAlgo::SetNodeAttr("axis", MakeValue(0), concat_op);
|
||||
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), concat_op);
|
||||
return concat_op;
|
||||
}
|
||||
|
||||
AnfNodePtr AddConcatD(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
|
||||
MS_EXCEPTION_IF_NULL(graph);
|
||||
MS_EXCEPTION_IF_NULL(node1);
|
||||
MS_EXCEPTION_IF_NULL(node2);
|
||||
// concatd --> concat
|
||||
std::vector<AnfNodePtr> concat_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimConcat->name())), node1,
|
||||
node2};
|
||||
auto concat_op = graph->NewCNode(concat_inputs);
|
||||
MS_EXCEPTION_IF_NULL(concat_op);
|
||||
std::vector<size_t> input = {AnfAlgo::GetOutputInferShape(node1, 0)[0], AnfAlgo::GetOutputInferShape(node1, 0)[1],
|
||||
AnfAlgo::GetOutputInferShape(node1, 0)[2] + AnfAlgo::GetOutputInferShape(node2, 0)[2]};
|
||||
auto types = {AnfAlgo::GetOutputInferDataType(node1, 0)};
|
||||
auto shapes = {input};
|
||||
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, concat_op.get());
|
||||
AnfAlgo::SetNodeAttr(kAttrN, MakeValue(2), concat_op);
|
||||
AnfAlgo::SetNodeAttr(kAttrDynInputSizes, MakeValue(std::vector<int>{2}), concat_op);
|
||||
AnfAlgo::SetNodeAttr("axis", MakeValue(2), concat_op);
|
||||
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), concat_op);
|
||||
return concat_op;
|
||||
}
|
||||
|
||||
AnfNodePtr AddDwReduceSum(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
|
||||
// node1 : dynamic output
|
||||
// node2 : matmul
|
||||
MS_EXCEPTION_IF_NULL(graph);
|
||||
MS_EXCEPTION_IF_NULL(node1);
|
||||
MS_EXCEPTION_IF_NULL(node2);
|
||||
std::vector<AnfNodePtr> ori_outputs;
|
||||
CreateMultipleOutputsOfAnfNode(graph, node1, 5, &ori_outputs);
|
||||
// ReduceSumd
|
||||
std::vector<AnfNodePtr> reducesum_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimReduceSum->name())),
|
||||
node2};
|
||||
auto reduce_sumd = graph->NewCNode(reducesum_inputs);
|
||||
MS_EXCEPTION_IF_NULL(reduce_sumd);
|
||||
auto types = {AnfAlgo::GetOutputInferDataType(ori_outputs[0], 0)};
|
||||
auto shapes = {AnfAlgo::GetOutputInferShape(ori_outputs[0], 0)};
|
||||
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, reduce_sumd.get());
|
||||
AnfAlgo::SetNodeAttr(kAttrAxis, MakeValue(std::vector<int>{0}), reduce_sumd);
|
||||
AnfAlgo::SetNodeAttr("keep_dims", MakeValue(false), reduce_sumd);
|
||||
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), reduce_sumd);
|
||||
return reduce_sumd;
|
||||
}
|
||||
|
||||
AnfNodePtr AddDbReduceSum(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
|
||||
// node1 lstm output
|
||||
// node2 // dynamic output
|
||||
MS_EXCEPTION_IF_NULL(graph);
|
||||
MS_EXCEPTION_IF_NULL(node1);
|
||||
MS_EXCEPTION_IF_NULL(node2);
|
||||
std::vector<AnfNodePtr> ori_outputs;
|
||||
CreateMultipleOutputsOfAnfNode(graph, node2, 5, &ori_outputs);
|
||||
// ReduceSumd --> ReduceSum
|
||||
std::vector<AnfNodePtr> reducerum_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimReduceSum->name())),
|
||||
node1};
|
||||
auto reduce_sumd = graph->NewCNode(reducerum_inputs);
|
||||
MS_EXCEPTION_IF_NULL(reduce_sumd);
|
||||
auto types = {AnfAlgo::GetOutputInferDataType(ori_outputs[1], 0)};
|
||||
auto shapes = {AnfAlgo::GetOutputInferShape(ori_outputs[1], 0)};
|
||||
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, reduce_sumd.get());
|
||||
AnfAlgo::SetNodeAttr(kAttrAxis, MakeValue(std::vector<int>{0, 1}), reduce_sumd);
|
||||
AnfAlgo::SetNodeAttr("keep_dims", MakeValue(false), reduce_sumd);
|
||||
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), reduce_sumd);
|
||||
return reduce_sumd;
|
||||
}
|
||||
|
||||
const AnfNodePtr DynamicRNNGradFission::Process(const FuncGraphPtr &func_graph, const AnfNodePtr &node,
|
||||
const EquivPtr &) const {
|
||||
MS_EXCEPTION_IF_NULL(func_graph);
|
||||
MS_EXCEPTION_IF_NULL(node);
|
||||
auto cnode = node->cast<CNodePtr>();
|
||||
MS_EXCEPTION_IF_NULL(cnode);
|
||||
if (cnode->size() < kDynamicRNNGradInputNum + 1) {
|
||||
MS_LOG(INFO) << "The input num of DynamicRNNGrad less than" << kDynamicRNNGradInputNum
|
||||
<< ". The node should not be changed";
|
||||
return nullptr;
|
||||
}
|
||||
// input_list of dynamic_rnn_grad
|
||||
const auto &ori_inputs = cnode->inputs();
|
||||
// create split_vd
|
||||
auto split_vd = CreateSplitVD(func_graph, ori_inputs[7]);
|
||||
// create concat_1
|
||||
auto h_concat = AddHConcatD(func_graph, ori_inputs[5], split_vd);
|
||||
// create concat_2
|
||||
auto concat = AddConcatD(func_graph, ori_inputs[1], h_concat);
|
||||
// create lsym_input_grad
|
||||
auto lstm_input_grad = CreateLSTMInputGrad(func_graph, cnode);
|
||||
std::vector<AnfNodePtr> lstm_outputs;
|
||||
CreateMultipleOutputsOfAnfNode(func_graph, lstm_input_grad, kLSTMInputGradOutputNum, &lstm_outputs);
|
||||
// create matmul
|
||||
auto batch_matmul = CreateBatchMatMul(func_graph, lstm_outputs[3], concat);
|
||||
// create reduce_sum_1
|
||||
auto dw_reduce_sum = AddDwReduceSum(func_graph, node, batch_matmul);
|
||||
// create reduce_sum_2
|
||||
auto db_reduce_sum = AddDbReduceSum(func_graph, lstm_outputs[3], node);
|
||||
std::vector<AnfNodePtr> make_tuple_inputs = {NewValueNode(prim::kPrimMakeTuple),
|
||||
dw_reduce_sum,
|
||||
db_reduce_sum,
|
||||
lstm_outputs[0],
|
||||
lstm_outputs[1],
|
||||
lstm_outputs[2]};
|
||||
auto make_tuple = func_graph->NewCNode(make_tuple_inputs);
|
||||
MS_EXCEPTION_IF_NULL(make_tuple);
|
||||
return make_tuple;
|
||||
}
|
||||
} // namespace opt
|
||||
} // namespace mindspore
|
|
@ -0,0 +1,33 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#ifndef MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_IR_FISSION_DYNAMIC_RNN_GRAD_FISSION_H_
|
||||
#define MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_IR_FISSION_DYNAMIC_RNN_GRAD_FISSION_H_
|
||||
|
||||
#include "backend/optimizer/common/optimizer.h"
|
||||
|
||||
namespace mindspore {
|
||||
namespace opt {
|
||||
class DynamicRNNGradFission : public PatternProcessPass {
|
||||
public:
|
||||
explicit DynamicRNNGradFission(bool multigraph = true) : PatternProcessPass("dynamic_rnn_grad_fission", multigraph) {}
|
||||
~DynamicRNNGradFission() override = default;
|
||||
const BaseRef DefinePattern() const override;
|
||||
const AnfNodePtr Process(const FuncGraphPtr &, const AnfNodePtr &, const EquivPtr &) const override;
|
||||
};
|
||||
} // namespace opt
|
||||
} // namespace mindspore
|
||||
|
||||
#endif // MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_IR_FISSION_DYNAMIC_RNN_GRAD_FISSION_H_
|
|
@ -21,11 +21,10 @@
|
|||
|
||||
namespace mindspore {
|
||||
namespace opt {
|
||||
const std::set<std::pair<string, string>> invalid_formats_pair = {{kOpFormat_C1HWNCoC0, kOpFormat_NCHW},
|
||||
{kOpFormat_NCHW, kOpFormat_C1HWNCoC0},
|
||||
{kOpFormat_C1HWNCoC0, kOpFormat_DEFAULT},
|
||||
{kOpFormat_DEFAULT, kOpFormat_FRACTAL_ZN_LSTM},
|
||||
{kOpFormat_DEFAULT, kOpFormat_C1HWNCoC0}};
|
||||
const std::set<std::pair<string, string>> invalid_formats_pair = {
|
||||
{kOpFormat_C1HWNCoC0, kOpFormat_NCHW}, {kOpFormat_NCHW, kOpFormat_C1HWNCoC0},
|
||||
{kOpFormat_C1HWNCoC0, kOpFormat_DEFAULT}, {kOpFormat_DEFAULT, kOpFormat_FRACTAL_ZN_LSTM},
|
||||
{kOpFormat_FRACTAL_ZN_LSTM, kOpFormat_DEFAULT}, {kOpFormat_DEFAULT, kOpFormat_C1HWNCoC0}};
|
||||
|
||||
bool TransDataSplit::Run(const FuncGraphPtr &func_graph) {
|
||||
MS_EXCEPTION_IF_NULL(func_graph);
|
||||
|
@ -83,6 +82,9 @@ bool TransDataSplit::DoSplit(const FuncGraphPtr &func_graph, const AnfNodePtr &n
|
|||
new_transpose_node = NewTransOpNode(func_graph, AnfAlgo::GetInputNode(node->cast<CNodePtr>(), 0), kernel_select_,
|
||||
false, prim::kPrimTranspose->name());
|
||||
AnfAlgo::SetNodeAttr(kAttrPerm, MakeValue(std::vector<int>{2, 3, 1, 0}), new_transpose_node);
|
||||
if (output_format == kOpFormat_FRACTAL_ZN_LSTM) {
|
||||
AnfAlgo::SetNodeAttr("nop_op", MakeValue(true), new_transpose_node);
|
||||
}
|
||||
RefreshKernelBuildInfo(input_format, kOpFormat_HWCN, new_transpose_node);
|
||||
|
||||
// trans hwcn to output_format
|
||||
|
|
|
@ -404,7 +404,11 @@ bool IsNopNode(const AnfNodePtr &node) {
|
|||
}
|
||||
CNodePtr cnode = node->cast<CNodePtr>();
|
||||
MS_EXCEPTION_IF_NULL(cnode);
|
||||
if (nop_nodes.find(AnfAlgo::GetCNodeName(cnode)) == nop_nodes.end()) {
|
||||
bool is_nop_node = false;
|
||||
if (AnfAlgo::HasNodeAttr("nop_op", cnode)) {
|
||||
is_nop_node = AnfAlgo::GetNodeAttr<bool>(cnode, "nop_op");
|
||||
}
|
||||
if (nop_nodes.find(AnfAlgo::GetCNodeName(cnode)) == nop_nodes.end() && !is_nop_node) {
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
|
|
|
@ -26,7 +26,7 @@ namespace mindspore {
|
|||
namespace memreuse {
|
||||
enum RefCountType { kDynamicRefCount, kStaticRefCount };
|
||||
enum NodeType { kCommonNode, kCommunicationNode };
|
||||
enum KernelRefType { kCommon, kRefNodeInput, kRefNodeOutput, kCommNotReuse, kCommReuse, kSummary };
|
||||
enum KernelRefType { kCommon, kRefNodeInput, kRefNodeOutput, kCommNotReuse, kCommReuse, kSummary, kAllocated };
|
||||
static constexpr int kInitIndex = -1;
|
||||
class KernelRefCount {
|
||||
public:
|
||||
|
|
|
@ -76,6 +76,9 @@ bool MemReuseUtil::InitDynamicOutputKernelRef() {
|
|||
}
|
||||
} else {
|
||||
kernel_ref->type_ = kCommon;
|
||||
if (AnfAlgo::OutputAddrExist(kernel_cnode, output_index)) {
|
||||
kernel_ref->type_ = kAllocated;
|
||||
}
|
||||
}
|
||||
}
|
||||
kernel_refs.push_back(kernel_ref);
|
||||
|
|
|
@ -16,6 +16,7 @@
|
|||
#include "backend/optimizer/mem_reuse/mem_reuse_allocator.h"
|
||||
#include "backend/optimizer/mem_reuse/mem_reuse.h"
|
||||
#include "backend/optimizer/mem_reuse/mem_reuse_checker.h"
|
||||
#include "utils/ms_context.h"
|
||||
#ifdef ENABLE_D
|
||||
#include "runtime/device/ascend/ascend_stream_assign.h"
|
||||
#endif
|
||||
|
@ -32,12 +33,22 @@ void BestFitMemReuse::InitMemReuseInfo(const MemReuseUtil *mem_reuse_util_ptr) {
|
|||
set_workspace_ptr_list(mem_reuse_util_ptr->total_wk_ref_list());
|
||||
set_op_ptr_list(mem_reuse_util_ptr->kernel_def_ptr_list());
|
||||
// check info Correctness
|
||||
auto context_ptr = MsContext::GetInstance();
|
||||
MS_EXCEPTION_IF_NULL(context_ptr);
|
||||
std::string device_target = context_ptr->get_param<std::string>(MS_CTX_DEVICE_TARGET);
|
||||
// CPU device does not need to be aligned
|
||||
if (device_target != kCPUDevice) {
|
||||
MS_LOG(DEBUG) << "Align Memory Size";
|
||||
for (auto &tensor : tensor_ptr_list_) {
|
||||
tensor->size_ = AlignCommonMemorySize(tensor->size_);
|
||||
}
|
||||
// align wk size to 512 && refcount == 1
|
||||
// align wk size to 512
|
||||
for (auto &wk : wk_tensor_list_) {
|
||||
wk->size_ = AlignCommonMemorySize(wk->size_);
|
||||
}
|
||||
}
|
||||
// set wk refcount == 1
|
||||
for (auto &wk : wk_tensor_list_) {
|
||||
wk->ref_count_ = 1;
|
||||
}
|
||||
#ifdef ENABLE_D
|
||||
|
@ -155,6 +166,9 @@ void BestFitMemReuse::AssignCommonNodeOutputOffset() {
|
|||
// get align size for communication op's single input
|
||||
tensor_desc->size_ = AlignCommunicationMemorySize(tensor_desc->size_);
|
||||
total_comm_reuse_size += tensor_desc->size_;
|
||||
} else if (tensor_desc->type_ == kAllocated) {
|
||||
// no need to alloc allocated memory
|
||||
continue;
|
||||
}
|
||||
|
||||
auto reusable_membuf_map = GetReusableMembufMap(tensor_desc->size_);
|
||||
|
|
|
@ -516,6 +516,10 @@ std::vector<size_t> AnfRuntimeAlgorithm::GetInputDeviceShape(const AnfNodePtr &n
|
|||
if (trans::IsNeedPadding(format, infer_shape.size())) {
|
||||
infer_shape = trans::PaddingShapeTo4d(infer_shape, GetInputReshapeType(node, input_idx));
|
||||
}
|
||||
if (node->isa<CNode>() && GetCNodeName(node) == kExtractImagePatchesOpName) {
|
||||
auto shape_tmp = {infer_shape[0], infer_shape[3], infer_shape[1], infer_shape[2]};
|
||||
return trans::TransShapeToDevice(shape_tmp, format);
|
||||
}
|
||||
return trans::TransShapeToDevice(infer_shape, format);
|
||||
}
|
||||
|
||||
|
|
|
@ -179,6 +179,7 @@ GraphId AscendSession::CompileGraph(NotNull<FuncGraphPtr> func_graph) {
|
|||
debugger_->PreExecute(root_graph);
|
||||
}
|
||||
#endif
|
||||
SetSummaryNodes(root_graph.get());
|
||||
// alloc mem
|
||||
MemoryAlloc(root_graph.get());
|
||||
// generate and load task into device
|
||||
|
|
|
@ -75,6 +75,16 @@ GraphId CPUSession::CompileGraph(const AnfNodePtrList &lst, const AnfNodePtrList
|
|||
#endif
|
||||
MS_LOG(INFO) << "Build kernel";
|
||||
BuildKernel(graph.get());
|
||||
|
||||
// Set graph execution order before memory alloc, ensure that memory alloc is according to the reorder graph
|
||||
auto execution_order = graph->execution_order();
|
||||
Reorder(&execution_order);
|
||||
graph->set_execution_order(execution_order);
|
||||
|
||||
MS_LOG(INFO) << "Start kernel runtime";
|
||||
if (!runtime_.Init()) {
|
||||
MS_LOG(EXCEPTION) << "CPU start kernel runtime failed";
|
||||
}
|
||||
MS_LOG(INFO) << "Assign kernel address";
|
||||
runtime_.AssignKernelAddress(graph.get());
|
||||
return graph_id;
|
||||
|
@ -98,11 +108,8 @@ void CPUSession::RunGraph(const GraphId &graph_id, const std::vector<tensor::Ten
|
|||
#endif
|
||||
|
||||
MS_LOG(INFO) << "Run graph start";
|
||||
auto execution_order = kernel_graph->execution_order();
|
||||
Reorder(&execution_order);
|
||||
|
||||
bool enable_summary = summary_callback_ != nullptr;
|
||||
kernel_graph->set_execution_order(execution_order);
|
||||
NamedSummaryOutputs summary_outputs;
|
||||
if (enable_summary) {
|
||||
SetSummaryNodes(kernel_graph.get());
|
||||
|
|
|
@ -125,7 +125,7 @@ class CreateCommGroupTask : public Task {
|
|||
void Run() override;
|
||||
std::string group_name_;
|
||||
std::vector<uint32_t> ranks_;
|
||||
bool result_;
|
||||
bool result_{false};
|
||||
};
|
||||
|
||||
class DestroyCommGroupTask : public Task {
|
||||
|
@ -134,7 +134,7 @@ class DestroyCommGroupTask : public Task {
|
|||
~DestroyCommGroupTask() override = default;
|
||||
void Run() override;
|
||||
std::string group_name_;
|
||||
bool result_;
|
||||
bool result_{false};
|
||||
};
|
||||
|
||||
class ExitTask : public Task {
|
||||
|
|
|
@ -64,8 +64,11 @@ void E2eDumpUtil::DumpGPUMemToFile(const std::string &file_path, const std::stri
|
|||
const ShapeVector &int_shapes, const TypeId &type, size_t slot, Debugger *debugger) {
|
||||
#ifdef ENABLE_DEBUGGER
|
||||
auto format = kOpFormat_DEFAULT;
|
||||
MS_EXCEPTION_IF_NULL(debugger);
|
||||
DebugServices *debug_services = debugger->debug_services();
|
||||
MS_EXCEPTION_IF_NULL(debug_services);
|
||||
TensorLoader *tensor_loader = debug_services->tensor_loader();
|
||||
MS_EXCEPTION_IF_NULL(tensor_loader);
|
||||
auto ret = tensor_loader->DumpTensorToFile(original_kernel_name, trans_flag, file_path, format, int_shapes, type,
|
||||
addr->type_id(), addr->format(), slot);
|
||||
if (!ret) {
|
||||
|
|
|
@ -309,7 +309,7 @@ void Debugger::PostDebugOp() {
|
|||
}
|
||||
}
|
||||
|
||||
std::map<std::pair<uint32_t, uint32_t>, std::string> &Debugger::GetStreamTaskToOpnameMap() {
|
||||
std::map<std::pair<uint32_t, uint32_t>, std::string> Debugger::GetStreamTaskToOpnameMap() const {
|
||||
return stream_task_to_opname_;
|
||||
}
|
||||
|
||||
|
@ -754,7 +754,8 @@ uint64_t BytestoInt64(const std::vector<char> &buffer) {
|
|||
uint64_t ret;
|
||||
|
||||
ret = ((uint64_t)buffer[7] << 56) | ((uint64_t)buffer[6] << 48) | ((uint64_t)buffer[5] << 40) |
|
||||
((uint64_t)buffer[4] << 32) | (buffer[3] << 24) | (buffer[2] << 16) | (buffer[1] << 8) | buffer[0];
|
||||
((uint64_t)buffer[4] << 32) | ((uint64_t)buffer[3] << 24) | ((uint64_t)buffer[2] << 16) |
|
||||
((uint64_t)buffer[1] << 8) | ((uint64_t)buffer[0]);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
|
|
@ -94,7 +94,7 @@ class Debugger : public std::enable_shared_from_this<Debugger> {
|
|||
|
||||
int32_t step_num() const;
|
||||
|
||||
std::map<std::pair<uint32_t, uint32_t>, std::string> &GetStreamTaskToOpnameMap();
|
||||
std::map<std::pair<uint32_t, uint32_t>, std::string> GetStreamTaskToOpnameMap() const;
|
||||
|
||||
// check if any feature that uses the debugger backend is enabled
|
||||
bool DebuggerBackendEnabled();
|
||||
|
|
|
@ -79,9 +79,9 @@ std::vector<std::string> ChunkString(std::string str, int graph_size) {
|
|||
}
|
||||
std::string buffer;
|
||||
buffer.resize(chunk_size);
|
||||
errno_t err = memcpy_s(reinterpret_cast<char *>(buffer.data()), chunk_size, str.data() + size_iter, chunk_size);
|
||||
auto err = memcpy_s(reinterpret_cast<char *>(buffer.data()), chunk_size, str.data() + size_iter, chunk_size);
|
||||
if (err != 0) {
|
||||
MS_LOG(ERROR) << "memcpy_s failed. err code is: " << err;
|
||||
MS_LOG(EXCEPTION) << "memcpy_s failed. errorno is: " << err;
|
||||
}
|
||||
buf.push_back(buffer);
|
||||
size_iter += CHUNK_SIZE;
|
||||
|
|
|
@ -255,7 +255,7 @@ class PynativeEliminater : public OptimizerCaller {
|
|||
MS_LOG(DEBUG) << "Start FillZero";
|
||||
ValuePtr out = nullptr;
|
||||
if (value->isa<Int32Imm>()) {
|
||||
return value;
|
||||
return MakeValue(value->cast<Int32ImmPtr>()->value());
|
||||
}
|
||||
|
||||
if (value->isa<tensor::Tensor>()) {
|
||||
|
@ -298,9 +298,10 @@ class PynativeEliminater : public OptimizerCaller {
|
|||
if (rep != nullptr) {
|
||||
if (rep->isa<ValueNode>()) {
|
||||
auto value_node = rep->cast<ValueNodePtr>();
|
||||
value_node->set_value(FillZero(value_node->value()));
|
||||
auto new_value_node = NewValueNode(FillZero(value_node->value()));
|
||||
new_value_node->set_has_new_value(value_node->has_new_value());
|
||||
MS_LOG(DEBUG) << "Zeros_like replace ok " << rep->DebugString(4);
|
||||
return rep;
|
||||
return new_value_node;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -315,9 +316,10 @@ class PynativeEliminater : public OptimizerCaller {
|
|||
if (rep != nullptr) {
|
||||
if (rep->isa<ValueNode>()) {
|
||||
auto value_node = rep->cast<ValueNodePtr>();
|
||||
value_node->set_value(FillZero(value_node->value()));
|
||||
auto new_value_node = NewValueNode(FillZero(value_node->value()));
|
||||
new_value_node->set_has_new_value(value_node->has_new_value());
|
||||
MS_LOG(DEBUG) << "Zeros_like replace ok 2 " << rep->DebugString(4);
|
||||
return rep;
|
||||
return new_value_node;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -69,7 +69,7 @@ PYBIND_REGISTER(Tensor, 0, ([](const py::module *m) {
|
|||
auto &tensor = py::cast<Tensor &>(t);
|
||||
if (tensor.type() == DataType::DE_STRING) {
|
||||
py::array res;
|
||||
tensor.GetDataAsNumpyStrings(&res);
|
||||
THROW_IF_ERROR(tensor.GetDataAsNumpyStrings(&res));
|
||||
return res;
|
||||
}
|
||||
py::buffer_info info;
|
||||
|
|
|
@ -30,6 +30,8 @@ namespace mindspore {
|
|||
namespace dataset {
|
||||
std::unique_ptr<Services> Services::instance_ = nullptr;
|
||||
std::once_flag Services::init_instance_flag_;
|
||||
std::set<std::string> Services::unique_id_list_ = {};
|
||||
std::mutex Services::unique_id_mutex_;
|
||||
|
||||
#if !defined(_WIN32) && !defined(_WIN64) && !defined(__ANDROID__) && !defined(ANDROID)
|
||||
std::string Services::GetUserName() {
|
||||
|
@ -52,9 +54,24 @@ std::string Services::GetUniqueID() {
|
|||
std::mt19937 gen = GetRandomDevice();
|
||||
std::uniform_int_distribution<uint32_t> dist(0, kStr.size() - 1);
|
||||
char buffer[UNIQUEID_LEN];
|
||||
{
|
||||
std::unique_lock<std::mutex> lock(unique_id_mutex_);
|
||||
while (true) {
|
||||
auto ret = memset_s(buffer, UNIQUEID_LEN, 0, UNIQUEID_LEN);
|
||||
if (ret != 0) {
|
||||
MS_LOG(ERROR) << "memset_s error, errorno(" << ret << ")";
|
||||
return std::string("");
|
||||
}
|
||||
for (int i = 0; i < UNIQUEID_LEN; i++) {
|
||||
buffer[i] = kStr[dist(gen)];
|
||||
}
|
||||
if (unique_id_list_.find(std::string(buffer, UNIQUEID_LEN)) != unique_id_list_.end()) {
|
||||
continue;
|
||||
}
|
||||
unique_id_list_.insert(std::string(buffer, UNIQUEID_LEN));
|
||||
break;
|
||||
}
|
||||
}
|
||||
return std::string(buffer, UNIQUEID_LEN);
|
||||
}
|
||||
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
#include <algorithm>
|
||||
#include <memory>
|
||||
#include <mutex>
|
||||
#include <set>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include "minddata/dataset/util/memory_pool.h"
|
||||
|
@ -97,6 +98,8 @@ class Services {
|
|||
private:
|
||||
static std::once_flag init_instance_flag_;
|
||||
static std::unique_ptr<Services> instance_;
|
||||
static std::set<std::string> unique_id_list_;
|
||||
static std::mutex unique_id_mutex_;
|
||||
// A small pool used for small objects that last until the
|
||||
// Services Manager shuts down. Used by all sub-services.
|
||||
std::shared_ptr<MemoryPool> pool_;
|
||||
|
|
|
@ -1790,38 +1790,92 @@ bool UpdateFuncGraphFlags(py::object obj, const FuncGraphPtr &func_graph) {
|
|||
return true;
|
||||
}
|
||||
|
||||
// Generate and copy a ValueNode, or a CNode with its child nodes
|
||||
static AnfNodePtr CopyNodesFromParamDefaultValue(const FuncGraphPtr func_graph, const AnfNodePtr ¶m_node) {
|
||||
MS_EXCEPTION_IF_NULL(param_node);
|
||||
if (param_node->isa<ValueNode>()) {
|
||||
return std::make_shared<ValueNode>(param_node->cast<ValueNodePtr>()->value());
|
||||
}
|
||||
|
||||
// Parameter default value is CNode.
|
||||
std::size_t index = 0;
|
||||
std::vector<AnfNodePtr> old_cnodes;
|
||||
old_cnodes.emplace_back(param_node);
|
||||
auto res = func_graph->NewCNode({});
|
||||
std::vector<CNodePtr> new_cnodes;
|
||||
new_cnodes.emplace_back(res);
|
||||
while (index < old_cnodes.size()) {
|
||||
auto current = old_cnodes[index];
|
||||
auto current_new_cnode = new_cnodes[index];
|
||||
index++;
|
||||
MS_EXCEPTION_IF_NULL(current);
|
||||
if (current->isa<CNode>()) {
|
||||
auto &inputs = current->cast<CNodePtr>()->inputs();
|
||||
for (auto it = inputs.begin(); it != inputs.end(); it++) {
|
||||
AnfNodePtr input = *it;
|
||||
if (input != nullptr && input->isa<CNode>()) {
|
||||
old_cnodes.emplace_back(input);
|
||||
auto new_cnode = func_graph->NewCNode({});
|
||||
new_cnodes.emplace_back(new_cnode);
|
||||
current_new_cnode->add_input(new_cnode);
|
||||
} else if (input->isa<ValueNode>()) {
|
||||
current_new_cnode->add_input(std::make_shared<ValueNode>(input->cast<ValueNodePtr>()->value()));
|
||||
} else {
|
||||
MS_LOG(EXCEPTION) << "Wrong type item in default parameters: " << input->ToString();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return res;
|
||||
}
|
||||
|
||||
FuncGraphPtr MakeTopGraph(const py::object &cell, const ValuePtr &cell_ptr) {
|
||||
auto current_graph = dyn_cast<FuncGraph>(cell_ptr);
|
||||
if (current_graph == nullptr) {
|
||||
MS_LOG(EXCEPTION) << "Current graph cast failed from " << cell_ptr->ToString();
|
||||
}
|
||||
|
||||
auto func_graph = std::make_shared<FuncGraph>();
|
||||
func_graph->debug_info()->set_name("top");
|
||||
func_graph->debug_info()->set_name(current_graph->debug_info()->name() + "_wrapper");
|
||||
|
||||
// def top(*arg, *kwargs):
|
||||
auto param_vargs = func_graph->add_parameter();
|
||||
auto args_name = "args";
|
||||
param_vargs->set_name(args_name);
|
||||
param_vargs->debug_info()->set_name(args_name);
|
||||
|
||||
auto param_vkwargs = func_graph->add_parameter();
|
||||
args_name = "kwargs";
|
||||
param_vkwargs->set_name(args_name);
|
||||
param_vkwargs->debug_info()->set_name(args_name);
|
||||
|
||||
func_graph->set_has_vararg(true);
|
||||
func_graph->set_has_kwarg(true);
|
||||
func_graph->set_kwonlyargs_count(0);
|
||||
// Copy all parameters information
|
||||
for (auto ¶ : current_graph->parameters()) {
|
||||
auto param = func_graph->add_parameter();
|
||||
auto orig_param = para->cast<ParameterPtr>();
|
||||
auto name = orig_param->name();
|
||||
param->set_name(name);
|
||||
param->debug_info()->set_name(name);
|
||||
}
|
||||
func_graph->set_has_vararg(current_graph->has_vararg());
|
||||
func_graph->set_has_kwarg(current_graph->has_kwarg());
|
||||
func_graph->set_kwonlyargs_count(current_graph->kwonlyargs_count());
|
||||
// Copy all default values
|
||||
for (auto &d : current_graph->parameter_default_value()) {
|
||||
func_graph->set_param_default_value(d.first, CopyNodesFromParamDefaultValue(func_graph, d.second));
|
||||
}
|
||||
|
||||
// cell_obj
|
||||
MS_LOG(DEBUG) << "add Flag for " << std::string(py::str(cell));
|
||||
parse::UpdateFuncGraphFlags(cell, func_graph);
|
||||
// top graph's construct flag
|
||||
if (py::hasattr(cell, "construct")) {
|
||||
parse::UpdateFuncGraphFlags(cell.attr("construct"), func_graph);
|
||||
}
|
||||
|
||||
auto unpacking = func_graph->has_vararg() || func_graph->has_kwarg();
|
||||
if (!unpacking) {
|
||||
std::vector<AnfNodePtr> inputs;
|
||||
inputs.emplace_back(NewValueNode(cell_ptr));
|
||||
auto ¶ms = func_graph->parameters();
|
||||
(void)std::transform(params.begin(), params.end(), std::back_inserter(inputs),
|
||||
[](AnfNodePtr node) -> AnfNodePtr { return node; });
|
||||
func_graph->set_output(func_graph->NewCNode(inputs));
|
||||
} else {
|
||||
// ret = cell_obj(*arg, *kwargs)
|
||||
auto call_fn = MakeUnpackCall(func_graph, NewValueNode(cell_ptr), {param_vargs, param_vkwargs});
|
||||
|
||||
auto call_fn = MakeUnpackCall(func_graph, NewValueNode(cell_ptr), func_graph->parameters());
|
||||
// return ret
|
||||
func_graph->set_output(call_fn);
|
||||
MS_LOG(DEBUG) << "add Flag for " << std::string(py::str(cell));
|
||||
}
|
||||
return func_graph;
|
||||
}
|
||||
} // namespace parse
|
||||
|
|
|
@ -260,7 +260,7 @@ py::object DoAutoCast(const py::object &arg, const TypeId &type_id) {
|
|||
py::object DoParamMixPrecisionCast(bool *is_cast, const py::object obj) {
|
||||
auto tensor = py::cast<tensor::TensorPtr>(obj);
|
||||
auto cast_type = tensor->cast_dtype();
|
||||
py::object cast_output;
|
||||
py::object cast_output = obj;
|
||||
if (cast_type != nullptr) {
|
||||
auto source_element = tensor->Dtype();
|
||||
if (source_element != nullptr && IsSubType(source_element, kFloat) && *source_element != *cast_type) {
|
||||
|
@ -282,12 +282,15 @@ py::object DoParamMixPrecisionCastTuple(bool *is_cast, const py::tuple tuple) {
|
|||
result[i] = DoParamMixPrecisionCast(is_cast, tuple[i]);
|
||||
} else if (py::isinstance<py::tuple>(tuple[i])) {
|
||||
result[i] = DoParamMixPrecisionCastTuple(is_cast, tuple[i]);
|
||||
} else {
|
||||
result[i] = tuple[i];
|
||||
}
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
bool GetSignatureType(const PrimitivePyPtr &prim, std::vector<SignatureEnumDType> *dtypes) {
|
||||
MS_EXCEPTION_IF_NULL(dtypes);
|
||||
auto signature = prim->signatures();
|
||||
bool has_sig_dtype = false;
|
||||
(void)std::transform(signature.begin(), signature.end(), std::back_inserter(*dtypes),
|
||||
|
@ -733,20 +736,29 @@ ValuePtr PynativeExecutor::GetForwardValue(const OpExecInfoPtr &op_exec_info) {
|
|||
|
||||
AnfNodePtr PynativeExecutor::MakeCNode(const OpExecInfoPtr &op_exec_info, std::vector<bool> *op_masks,
|
||||
abstract::AbstractBasePtrList *args_spec_list) {
|
||||
MS_EXCEPTION_IF_NULL(op_masks);
|
||||
MS_EXCEPTION_IF_NULL(args_spec_list);
|
||||
CNodePtr cnode = nullptr;
|
||||
std::vector<AnfNodePtr> inputs;
|
||||
|
||||
auto prim = op_exec_info->py_primitive;
|
||||
const auto &signature = prim->signatures();
|
||||
|
||||
inputs.push_back(NewValueNode(prim));
|
||||
|
||||
size_t size = op_exec_info->op_inputs.size();
|
||||
auto sig_size = signature.size();
|
||||
// ignore signature for cast op
|
||||
if (sig_size > 0 && sig_size != size) {
|
||||
MS_EXCEPTION(ValueError) << op_exec_info->op_name << " inputs size " << size << " does not match the requires "
|
||||
<< "inputs size " << sig_size;
|
||||
}
|
||||
bool is_cast_op = (op_exec_info->op_name == "Cast");
|
||||
if (!is_cast_op) {
|
||||
const auto &signature = prim->signatures();
|
||||
for (size_t i = 0; i < size; i++) {
|
||||
auto obj = op_exec_info->op_inputs[i];
|
||||
auto sig = SignatureEnumRW::kRWDefault;
|
||||
if (signature.size() > 0) {
|
||||
if (sig_size > 0) {
|
||||
sig = signature[i].rw;
|
||||
}
|
||||
MS_LOG(DEBUG) << "check mix precision " << op_exec_info->op_name << " input " << i << " "
|
||||
|
@ -1173,10 +1185,6 @@ py::tuple RunOp(const py::args &args) {
|
|||
std::ostringstream oss;
|
||||
trace::TraceGraphEval();
|
||||
trace::GetEvalStackInfo(oss);
|
||||
// call py::print to output function call stack to STDOUT, in case of output the log to file, the user can see
|
||||
// these info from screen, no need to open log file to find these info
|
||||
py::print(oss.str());
|
||||
MS_LOG(ERROR) << oss.str();
|
||||
PynativeExecutor::GetInstance()->Clean();
|
||||
// re-throw this exception to Python interpreter to handle it
|
||||
throw(py::error_already_set(ex));
|
||||
|
|
|
@ -406,8 +406,10 @@ void GPUProfiler::OpDataProducerBegin(const std::string op_name, void *stream) {
|
|||
CHECK_CUDA_RET_WITH_ERROR(cudaEventRecord(op_event_start_, (CUstream)stream_),
|
||||
"cudaEventRecord op event start failed");
|
||||
op_host_time_start_ = GetHostTimeStamp();
|
||||
op_cupti_time_start_ = GetCUPTITimeStamp();
|
||||
} else {
|
||||
op_host_time_start_ = GetHostTimeStamp();
|
||||
op_cupti_time_start_ = GetCUPTITimeStamp();
|
||||
}
|
||||
SetRunTimeData(op_name, stream);
|
||||
}
|
||||
|
@ -431,7 +433,7 @@ void GPUProfiler::OpDataProducerEnd() {
|
|||
}
|
||||
MS_LOG(DEBUG) << "Host Time Elapsed(us)," << op_name_ << "," << op_time_elapsed;
|
||||
SetRunTimeData(op_name_, op_time_elapsed);
|
||||
SetRunTimeData(op_name_, op_host_time_start_, op_time_elapsed);
|
||||
SetRunTimeData(op_name_, op_cupti_time_start_, op_time_elapsed);
|
||||
}
|
||||
|
||||
void GPUProfiler::StopCUPTI() {
|
||||
|
|
|
@ -170,6 +170,7 @@ class GPUProfiler {
|
|||
cudaEvent_t op_event_stop_;
|
||||
uint64_t op_host_time_start_;
|
||||
uint64_t op_host_time_stop_;
|
||||
uint64_t op_cupti_time_start_;
|
||||
std::string profile_data_path_;
|
||||
};
|
||||
} // namespace gpu
|
||||
|
|
|
@ -455,7 +455,7 @@ REGISTER_PYBIND_DEFINE(Tensor, ([](const py::module *m) {
|
|||
>>> data.set_dtype(mindspore.int32)
|
||||
mindspore.int32
|
||||
)mydelimiter")
|
||||
.def("set_cast_dtype", &Tensor::set_cast_dtype)
|
||||
.def("set_cast_dtype", &Tensor::set_cast_dtype, py::arg("dtype") = nullptr)
|
||||
.def("__str__", &Tensor::ToString)
|
||||
.def("__repr__", &Tensor::ToStringRepr)
|
||||
.def(py::pickle(
|
||||
|
|
|
@ -14,12 +14,12 @@ endif ()
|
|||
|
||||
if (ENABLE_CPU)
|
||||
file(GLOB_RECURSE CPU_SRC_LIST RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "cpu/*.cc")
|
||||
list(REMOVE_ITEM CPU_SRC_LIST "cpu/mpi/mpi_adapter.cc", "cpu/mpi/mpi_export.cc")
|
||||
list(REMOVE_ITEM CPU_SRC_LIST "cpu/mpi/mpi_adapter.cc" "cpu/mpi/mpi_export.cc")
|
||||
endif ()
|
||||
|
||||
if (ENABLE_MPI)
|
||||
if (ENABLE_CPU)
|
||||
file(GLOB_RECURSE MPI_SRC_LIST "cpu/mpi/mpi_adapter.cc", "cpu/mpi/mpi_export.cc")
|
||||
file(GLOB_RECURSE MPI_SRC_LIST "cpu/mpi/mpi_adapter.cc" "cpu/mpi/mpi_export.cc")
|
||||
set_property(SOURCE ${MPI_SRC_LIST}
|
||||
PROPERTY COMPILE_DEFINITIONS SUBMODULE_ID=mindspore::SubModuleId::SM_DEVICE)
|
||||
add_library(mpi_adapter SHARED ${MPI_SRC_LIST})
|
||||
|
|
|
@ -57,7 +57,6 @@ constexpr const char *kOpTypeOpDebug = "Opdebug";
|
|||
namespace mindspore {
|
||||
namespace device {
|
||||
namespace ascend {
|
||||
|
||||
DataDumper::~DataDumper() {
|
||||
ReleaseDevMem(&dev_load_mem_);
|
||||
ReleaseDevMem(&dev_unload_mem_);
|
||||
|
@ -116,7 +115,7 @@ void DataDumper::LoadDumpInfo() {
|
|||
auto debugger = mindspore::Debugger::GetInstance();
|
||||
MS_EXCEPTION_IF_NULL(debugger);
|
||||
if (debugger->DebuggerBackendEnabled()) {
|
||||
std::map<std::pair<uint32_t, uint32_t>, std::string> &stream_task_to_opname = debugger->GetStreamTaskToOpnameMap();
|
||||
std::map<std::pair<uint32_t, uint32_t>, std::string> stream_task_to_opname = debugger->GetStreamTaskToOpnameMap();
|
||||
// extract stream id, task id and opname from runtime_info_map for overflow detection
|
||||
std::transform(runtime_info_map_.begin(), runtime_info_map_.end(),
|
||||
std::inserter(stream_task_to_opname, stream_task_to_opname.end()),
|
||||
|
|
|
@ -52,8 +52,12 @@ const int kUnSupportMixedDataTypeIndex = -1;
|
|||
bool MatchInferOutputDataType(const CNodePtr &cnode, const kernel::KernelBuildInfo &kernel_build_info) {
|
||||
MS_EXCEPTION_IF_NULL(cnode);
|
||||
// Check input data type
|
||||
auto name = AnfAlgo::GetCNodeName(cnode);
|
||||
for (size_t input_index = 0; input_index < kernel_build_info.GetInputNum(); ++input_index) {
|
||||
TypeId input_origin_type = AnfAlgo::GetPrevNodeOutputInferDataType(cnode, input_index);
|
||||
if (name == kDynamicRNNOpName && input_origin_type == kMetaTypeNone) {
|
||||
continue;
|
||||
}
|
||||
if (kernel_build_info.GetInputDeviceType(input_index) != input_origin_type) {
|
||||
return false;
|
||||
}
|
||||
|
@ -478,6 +482,9 @@ void SetTensorDeviceInfo(const kernel::KernelBuildInfo &selected_kernel_info, co
|
|||
AnfAlgo::SetSelectKernelBuildInfo(builder->Build(), input_kernel_node.get());
|
||||
continue;
|
||||
}
|
||||
if (selected_kernel_info.GetInputFormat(input_index) == kOpFormat_FRACTAL_ZN_LSTM) {
|
||||
continue;
|
||||
}
|
||||
// we set special device info of a input tensor.
|
||||
bool is_ref = false;
|
||||
auto op_info = kernel::OpLib::FindOp(AnfAlgo::GetCNodeName(kernel_node), kernel::kTBE);
|
||||
|
|
|
@ -127,8 +127,12 @@ bool TaskGenerator::LaunchKernel(const CNodePtr &anf_node_ptr, uint32_t stream_i
|
|||
auto kernel_mod = AnfAlgo::GetKernelMod(anf_node_ptr);
|
||||
MS_EXCEPTION_IF_NULL(kernel_mod);
|
||||
kernel_mod->set_kernel_name(anf_node_ptr->fullname_with_scope());
|
||||
auto op_name = AnfAlgo::GetCNodeName(anf_node_ptr);
|
||||
if (AnfAlgo::GetCNodeName(anf_node_ptr) != kAtomicAddrCleanOpName) {
|
||||
for (size_t i = 0; i < AnfAlgo::GetInputTensorNum(anf_node_ptr); ++i) {
|
||||
if (op_name == kDynamicRNNOpName && i == 3) {
|
||||
continue;
|
||||
}
|
||||
auto real_input_index = AnfAlgo::GetRealInputIndex(anf_node_ptr, i);
|
||||
auto device_address = AnfAlgo::GetPrevNodeOutputAddr(anf_node_ptr, real_input_index);
|
||||
AddressPtr input = std::make_shared<Address>();
|
||||
|
|
|
@ -22,22 +22,68 @@
|
|||
#include <functional>
|
||||
#include "backend/kernel_compiler/kernel.h"
|
||||
#include "runtime/device/cpu/cpu_device_address.h"
|
||||
#include "runtime/device/cpu/cpu_memory_manager.h"
|
||||
#include "utils/ms_context.h"
|
||||
#include "backend/session/anf_runtime_algorithm.h"
|
||||
#include "backend/session/session_basic.h"
|
||||
#include "frontend/operator/ops.h"
|
||||
#include "utils/shape_utils.h"
|
||||
#include "utils/profile.h"
|
||||
#ifdef MEM_REUSE_DEBUG
|
||||
#include "backend/optimizer/mem_reuse/mem_reuse_checker.h"
|
||||
#endif
|
||||
|
||||
namespace mindspore {
|
||||
namespace device {
|
||||
namespace cpu {
|
||||
|
||||
bool CPUKernelRuntime::Init()
|
||||
{
|
||||
mem_manager_ = std::make_shared<CPUMemoryManager>();
|
||||
MS_EXCEPTION_IF_NULL(mem_manager_);
|
||||
return true;
|
||||
}
|
||||
|
||||
const size_t INIT_NODE_REF = 1;
|
||||
void CPUKernelRuntime::AssignKernelAddress(session::KernelGraph *kernel_graph) {
|
||||
AssignValueNodeAddress(kernel_graph);
|
||||
AssignInputNodeAddress(kernel_graph);
|
||||
auto context_ptr = MsContext::GetInstance();
|
||||
MS_EXCEPTION_IF_NULL(context_ptr);
|
||||
bool is_enable_mem_reuse = context_ptr->get_param<bool>(MS_CTX_ENABLE_MEM_REUSE);
|
||||
if (is_enable_mem_reuse) {
|
||||
AssignStaticMemoryOutput(kernel_graph);
|
||||
AssignDynamicMemory(kernel_graph);
|
||||
#ifdef MEM_REUSE_DEBUG
|
||||
mindspore::memreuse::MemReuseChecker::GetInstance().CheckNormalIR(kernel_graph);
|
||||
#endif
|
||||
} else {
|
||||
AssignKernelOutputAddress(kernel_graph);
|
||||
resource_manager_.AssignMemory(kernel_graph);
|
||||
static_cast<CPUMemoryManager *>(mem_manager_.get())->AssignMemory(kernel_graph);
|
||||
}
|
||||
}
|
||||
|
||||
void CPUKernelRuntime::AssignStaticMemoryOutput(const session::KernelGraph *graph) {
|
||||
MS_EXCEPTION_IF_NULL(graph);
|
||||
auto nodes = AnfAlgo::GetAllOutput(graph->output(), {prim::kPrimTupleGetItem});
|
||||
std::vector<session::KernelWithIndex> non_communication_op;
|
||||
// Assign Communicate Op Memory firstly.
|
||||
for (const auto &node : nodes) {
|
||||
auto item_with_index = AnfAlgo::VisitKernelWithReturnType(node, 0, true);
|
||||
MS_EXCEPTION_IF_NULL(item_with_index.first);
|
||||
if (!item_with_index.first->isa<CNode>() || !AnfAlgo::IsRealKernel(item_with_index.first)) {
|
||||
continue;
|
||||
}
|
||||
if (AnfAlgo::IsCommunicationOp(item_with_index.first)) {
|
||||
AssignCommunicationNodeMem(kStaticMem, item_with_index.first);
|
||||
} else {
|
||||
non_communication_op.emplace_back(item_with_index);
|
||||
}
|
||||
}
|
||||
|
||||
for (const auto &item_with_index : non_communication_op) {
|
||||
AssignNodeOutputMem(kStaticMem, item_with_index.first, SizeToInt(item_with_index.second));
|
||||
}
|
||||
}
|
||||
|
||||
void CPUKernelRuntime::AssignValueNodeAddress(session::KernelGraph *kernel_graph) {
|
||||
|
@ -73,7 +119,8 @@ void CPUKernelRuntime::AssignValueNodeAddress(session::KernelGraph *kernel_graph
|
|||
tensor->data_type() == kNumberTypeInt64) {
|
||||
address->ptr_ = tensor->data_c();
|
||||
} else {
|
||||
address->ptr_ = resource_manager_.MemMalloc(tensor_size);
|
||||
MS_EXCEPTION_IF_NULL(mem_manager_);
|
||||
address->ptr_ = static_cast<CPUMemoryManager *>(mem_manager_.get())->MemMalloc(tensor_size);
|
||||
if (!address->SyncHostToDevice(data_shape, LongToSize(tensor->data().nbytes()), tensor->data_type(),
|
||||
tensor->data_c())) {
|
||||
MS_LOG(EXCEPTION) << "Value node sync host to device failed!";
|
||||
|
@ -166,10 +213,11 @@ tensor::TensorPtr CPUKernelRuntime::CreatTensorForOutput(session::KernelGraph *k
|
|||
tensor->set_sync_status(kNeedSyncDeviceToHostImmediately);
|
||||
} else {
|
||||
if (infer_type_id != device_type_id) {
|
||||
MS_EXCEPTION_IF_NULL(mem_manager_);
|
||||
size_t type_size = GetTypeByte(TypeIdToType(device_type_id));
|
||||
ShapeVector data_shape = tensor->shape();
|
||||
size_t tensor_size = std::accumulate(data_shape.begin(), data_shape.end(), type_size, std::multiplies<size_t>());
|
||||
address->ptr_ = resource_manager_.MemMalloc(tensor_size);
|
||||
address->ptr_ = static_cast<CPUMemoryManager *>(mem_manager_.get())->MemMalloc(tensor_size);
|
||||
tensor->set_device_address(address);
|
||||
tensor->set_sync_status(kNeedSyncDeviceToHostImmediately);
|
||||
} else {
|
||||
|
@ -241,10 +289,11 @@ void CPUKernelRuntime::BindInputOutput(session::KernelGraph *kernel_graph, const
|
|||
tensor->data_type() == kNumberTypeInt32 || tensor->data_type() == kNumberTypeInt64) {
|
||||
address->ptr_ = tensor->data_c();
|
||||
} else {
|
||||
MS_EXCEPTION_IF_NULL(mem_manager_);
|
||||
ShapeVector data_shape = tensor->shape();
|
||||
size_t tensor_size =
|
||||
std::accumulate(data_shape.begin(), data_shape.end(), sizeof(float), std::multiplies<size_t>());
|
||||
address->ptr_ = resource_manager_.MemMalloc(tensor_size);
|
||||
address->ptr_ = static_cast<CPUMemoryManager *>(mem_manager_.get())->MemMalloc(tensor_size);
|
||||
if (!address->SyncHostToDevice(data_shape, LongToSize(tensor->data().nbytes()), tensor->data_type(),
|
||||
tensor->data_c())) {
|
||||
MS_LOG(EXCEPTION) << "Parameter node sync host to device failed!";
|
||||
|
@ -271,7 +320,8 @@ void CPUKernelRuntime::AddRuntimeAddress(DeviceAddress *address, std::vector<ker
|
|||
kernel::AddressPtr input = std::make_shared<kernel::Address>();
|
||||
MS_EXCEPTION_IF_NULL(input);
|
||||
if (address->ptr_ == nullptr) {
|
||||
address->ptr_ = resource_manager_.MemMalloc(address->size_);
|
||||
MS_EXCEPTION_IF_NULL(mem_manager_);
|
||||
address->ptr_ = static_cast<CPUMemoryManager *>(mem_manager_.get())->MemMalloc(address->size_);
|
||||
}
|
||||
MS_EXCEPTION_IF_NULL(address->ptr_);
|
||||
input->addr = address->ptr_;
|
||||
|
@ -280,16 +330,19 @@ void CPUKernelRuntime::AddRuntimeAddress(DeviceAddress *address, std::vector<ker
|
|||
}
|
||||
|
||||
void CPUKernelRuntime::IncreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
|
||||
resource_manager_.IncreaseSummaryRefCount(summary_outputs);
|
||||
MS_EXCEPTION_IF_NULL(mem_manager_);
|
||||
static_cast<CPUMemoryManager *>(mem_manager_.get())->IncreaseSummaryRefCount(summary_outputs);
|
||||
}
|
||||
|
||||
void CPUKernelRuntime::DecreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
|
||||
resource_manager_.DecreaseSummaryRefCount(summary_outputs);
|
||||
MS_EXCEPTION_IF_NULL(mem_manager_);
|
||||
static_cast<CPUMemoryManager *>(mem_manager_.get())->DecreaseSummaryRefCount(summary_outputs);
|
||||
}
|
||||
|
||||
bool CPUKernelRuntime::Run(session::KernelGraph *kernel_graph, bool is_task_sink, Debugger *debugger) {
|
||||
MS_EXCEPTION_IF_NULL(kernel_graph);
|
||||
resource_manager_.IncreaseAddressRefCount(kernel_graph);
|
||||
MS_EXCEPTION_IF_NULL(mem_manager_);
|
||||
static_cast<CPUMemoryManager *>(mem_manager_.get())->IncreaseAddressRefCount(kernel_graph);
|
||||
|
||||
auto kernels = kernel_graph->execution_order();
|
||||
for (const auto &kernel : kernels) {
|
||||
|
@ -319,7 +372,7 @@ bool CPUKernelRuntime::Run(session::KernelGraph *kernel_graph, bool is_task_sink
|
|||
AddRuntimeAddress(device_address, &kernel_workspaces);
|
||||
}
|
||||
auto ret = kernel_mod->Launch(kernel_inputs, kernel_workspaces, kernel_outputs, 0);
|
||||
resource_manager_.DecreaseAddressRefCount(kernel);
|
||||
static_cast<CPUMemoryManager *>(mem_manager_.get())->DecreaseAddressRefCount(kernel);
|
||||
if (!ret) {
|
||||
MS_LOG(EXCEPTION) << "Launch kernel failed.";
|
||||
}
|
||||
|
|
|
@ -24,7 +24,6 @@
|
|||
#include "runtime/device/kernel_runtime.h"
|
||||
#include "backend/session/kernel_graph.h"
|
||||
#include "backend/session/session_basic.h"
|
||||
#include "runtime/device/cpu/cpu_resource_manager.h"
|
||||
#include "backend/session/anf_runtime_algorithm.h"
|
||||
#include "utils/any.h"
|
||||
namespace mindspore {
|
||||
|
@ -35,7 +34,7 @@ class CPUKernelRuntime : public KernelRuntime {
|
|||
CPUKernelRuntime() = default;
|
||||
~CPUKernelRuntime() override = default;
|
||||
|
||||
bool Init() override { return true; }
|
||||
bool Init() override;
|
||||
bool Run(session::KernelGraph *graph, bool is_task_sink, Debugger *debugger = nullptr) override;
|
||||
void AssignKernelAddress(session::KernelGraph *kernel_graph);
|
||||
void BindInputOutput(session::KernelGraph *kernel_graph, const std::vector<tensor::TensorPtr> &inputs,
|
||||
|
@ -49,6 +48,7 @@ class CPUKernelRuntime : public KernelRuntime {
|
|||
TypeId type_id) override;
|
||||
|
||||
private:
|
||||
void AssignStaticMemoryOutput(const session::KernelGraph *graph);
|
||||
tensor::TensorPtr CreatTensorForOutput(session::KernelGraph *kernel_graph, const CNodePtr &node, size_t index);
|
||||
|
||||
BaseRef CreatTensorForOutput(session::KernelGraph *kernel_graph, const session::KernelWithIndex &kernel_with_index);
|
||||
|
@ -56,7 +56,6 @@ class CPUKernelRuntime : public KernelRuntime {
|
|||
void AssignInputNodeAddress(const session::KernelGraph *kernel_graph);
|
||||
void AssignKernelOutputAddress(const session::KernelGraph *kernel_graph);
|
||||
void AddRuntimeAddress(DeviceAddress *address, std::vector<kernel::AddressPtr> *input_list);
|
||||
CPUResourceManager resource_manager_;
|
||||
std::set<DeviceAddressPtr> bound_addresses_;
|
||||
std::map<AnfNodePtr, tensor::TensorPtr> input_param_tensor_map_;
|
||||
};
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/**
|
||||
* Copyright 2019 Huawei Technologies Co., Ltd
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
|
@ -13,15 +13,34 @@
|
|||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#include "runtime/device/cpu/cpu_resource_manager.h"
|
||||
#include "backend/session/anf_runtime_algorithm.h"
|
||||
|
||||
#include "runtime/device/cpu/cpu_memory_manager.h"
|
||||
#include "backend/session/anf_runtime_algorithm.h"
|
||||
#include "utils/ms_context.h"
|
||||
#include "utils/convert_utils.h"
|
||||
namespace mindspore {
|
||||
namespace device {
|
||||
namespace cpu {
|
||||
CPUResourceManager::~CPUResourceManager() { MemFree(); }
|
||||
|
||||
void CPUResourceManager::MemFree() {
|
||||
uint8_t *CPUMemoryManager::MallocStaticMem(size_t size, bool communication_mem) {
|
||||
// For cpu, static memory is the same as dynamic memory
|
||||
return MallocDynamicMem(size, communication_mem);
|
||||
}
|
||||
|
||||
uint8_t *CPUMemoryManager::MallocDynamicMem(size_t size, bool) {
|
||||
return (uint8_t *)MemMalloc(size);
|
||||
}
|
||||
|
||||
void CPUMemoryManager::ResetDynamicMemory() {
|
||||
for (auto &&iter : dynamic_mem_) {
|
||||
free(iter.first);
|
||||
}
|
||||
dynamic_mem_.clear();
|
||||
}
|
||||
|
||||
CPUMemoryManager::~CPUMemoryManager() { MemFree(); }
|
||||
|
||||
void CPUMemoryManager::MemFree() {
|
||||
if (mem_ptr_ != nullptr) {
|
||||
free(mem_ptr_);
|
||||
mem_ptr_ = nullptr;
|
||||
|
@ -34,7 +53,7 @@ void CPUResourceManager::MemFree() {
|
|||
dynamic_mem_.clear();
|
||||
}
|
||||
|
||||
void CPUResourceManager::AssignMemory(const session::KernelGraph *graph) {
|
||||
void CPUMemoryManager::AssignMemory(const session::KernelGraph *graph) {
|
||||
size_t graph_mem_size = mem_plan_.MemPlan(graph);
|
||||
if (graph_mem_size > mem_size_) {
|
||||
if (mem_size_ > 0) {
|
||||
|
@ -43,6 +62,7 @@ void CPUResourceManager::AssignMemory(const session::KernelGraph *graph) {
|
|||
}
|
||||
mem_ptr_ = reinterpret_cast<uint8_t *>(malloc(graph_mem_size));
|
||||
if (mem_ptr_ != nullptr) {
|
||||
MS_LOG(INFO) << "Simple MemPlan GraphMemSize [" << graph_mem_size << "]";
|
||||
mem_size_ = graph_mem_size;
|
||||
dynamic_malloc_ = false;
|
||||
} else {
|
||||
|
@ -56,7 +76,7 @@ void CPUResourceManager::AssignMemory(const session::KernelGraph *graph) {
|
|||
mem_plan_.MemAssign(graph, mem_ptr_);
|
||||
}
|
||||
|
||||
void *CPUResourceManager::MemMalloc(size_t mem_size) {
|
||||
void *CPUMemoryManager::MemMalloc(size_t mem_size) {
|
||||
void *ptr = malloc(mem_size);
|
||||
if (ptr != nullptr) {
|
||||
memset_s(ptr, mem_size, 0, mem_size);
|
||||
|
@ -67,7 +87,7 @@ void *CPUResourceManager::MemMalloc(size_t mem_size) {
|
|||
}
|
||||
}
|
||||
|
||||
void CPUResourceManager::MemFree(void *ptr) {
|
||||
void CPUMemoryManager::MemFree(void *ptr) {
|
||||
auto iter = dynamic_mem_.find(ptr);
|
||||
if (iter != dynamic_mem_.end()) {
|
||||
(void)dynamic_mem_.erase(iter);
|
||||
|
@ -75,7 +95,7 @@ void CPUResourceManager::MemFree(void *ptr) {
|
|||
}
|
||||
}
|
||||
|
||||
void CPUResourceManager::IncreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
|
||||
void CPUMemoryManager::IncreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
|
||||
if (!dynamic_malloc_) {
|
||||
return;
|
||||
}
|
||||
|
@ -93,7 +113,7 @@ void CPUResourceManager::IncreaseSummaryRefCount(const session::NamedSummaryOutp
|
|||
}
|
||||
}
|
||||
|
||||
void CPUResourceManager::DecreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
|
||||
void CPUMemoryManager::DecreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
|
||||
if (!dynamic_malloc_) {
|
||||
return;
|
||||
}
|
||||
|
@ -115,7 +135,7 @@ void CPUResourceManager::DecreaseSummaryRefCount(const session::NamedSummaryOutp
|
|||
}
|
||||
}
|
||||
|
||||
void CPUResourceManager::IncreaseAddressRefCount(const session::KernelGraph *graph) {
|
||||
void CPUMemoryManager::IncreaseAddressRefCount(const session::KernelGraph *graph) {
|
||||
if (!dynamic_malloc_) {
|
||||
return;
|
||||
}
|
||||
|
@ -140,7 +160,7 @@ void CPUResourceManager::IncreaseAddressRefCount(const session::KernelGraph *gra
|
|||
}
|
||||
}
|
||||
|
||||
void CPUResourceManager::DecreaseAddressRefCount(const AnfNodePtr &kernel) {
|
||||
void CPUMemoryManager::DecreaseAddressRefCount(const AnfNodePtr &kernel) {
|
||||
if (!dynamic_malloc_) {
|
||||
return;
|
||||
}
|
|
@ -1,5 +1,5 @@
|
|||
/**
|
||||
* Copyright 2019 Huawei Technologies Co., Ltd
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
|
@ -13,22 +13,27 @@
|
|||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#ifndef MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_RESOURCE_MANAGER_H_
|
||||
#define MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_RESOURCE_MANAGER_H_
|
||||
|
||||
#ifndef MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_MEMORY_MANAGER_H_
|
||||
#define MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_MEMORY_MANAGER_H_
|
||||
#include <vector>
|
||||
#include <map>
|
||||
#include "backend/session/kernel_graph.h"
|
||||
#include "backend/session/session_basic.h"
|
||||
#include "runtime/device/device_address.h"
|
||||
#include "runtime/device/memory_manager.h"
|
||||
#include "runtime/device/cpu/cpu_simple_mem_plan.h"
|
||||
namespace mindspore {
|
||||
namespace device {
|
||||
namespace cpu {
|
||||
class CPUResourceManager {
|
||||
class CPUMemoryManager : public MemoryManager {
|
||||
public:
|
||||
CPUResourceManager() = default;
|
||||
~CPUResourceManager();
|
||||
CPUMemoryManager() = default;
|
||||
virtual ~CPUMemoryManager();
|
||||
|
||||
void MallocDeviceMemory() override {}
|
||||
void FreeDeviceMemory() override {}
|
||||
void ResetDynamicMemory() override;
|
||||
|
||||
void AssignMemory(const session::KernelGraph *graph);
|
||||
void IncreaseAddressRefCount(const session::KernelGraph *graph);
|
||||
|
@ -38,6 +43,10 @@ class CPUResourceManager {
|
|||
void IncreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs);
|
||||
void DecreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs);
|
||||
|
||||
protected:
|
||||
uint8_t *MallocStaticMem(size_t size, bool communication_mem) override;
|
||||
uint8_t *MallocDynamicMem(size_t size, bool communication_mem) override;
|
||||
|
||||
private:
|
||||
void MemFree();
|
||||
CPUSimpleMemPlan mem_plan_;
|
||||
|
@ -50,5 +59,4 @@ class CPUResourceManager {
|
|||
} // namespace cpu
|
||||
} // namespace device
|
||||
} // namespace mindspore
|
||||
|
||||
#endif // MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_RESOURCE_MANAGER_H_
|
||||
#endif // MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_MEMORY_MANAGER_H_
|
|
@ -28,7 +28,7 @@ namespace mindspore {
|
|||
namespace device {
|
||||
namespace cpu {
|
||||
class CPUSimpleMemPlan;
|
||||
class CPUResourceManager;
|
||||
class CPUMemoryManager;
|
||||
class CPUKernelRuntime;
|
||||
} // namespace cpu
|
||||
namespace ascend {
|
||||
|
@ -87,7 +87,7 @@ class DeviceAddress : public mindspore::DeviceSync {
|
|||
friend class MemoryManager;
|
||||
friend class mindspore::device::ascend::tasksink::TaskGenerator;
|
||||
friend class mindspore::device::cpu::CPUSimpleMemPlan;
|
||||
friend class mindspore::device::cpu::CPUResourceManager;
|
||||
friend class mindspore::device::cpu::CPUMemoryManager;
|
||||
friend class mindspore::device::cpu::CPUKernelRuntime;
|
||||
friend class mindspore::device::gpu::GPUKernelRuntime;
|
||||
friend class mindspore::device::gpu::GPUMemoryManager;
|
||||
|
|
|
@ -141,7 +141,6 @@ inline bool CheckNullInput(std::vector<size_t> input_shape) {
|
|||
MS_LOG(EXCEPTION) << "CUAD curand Error: " << message << " | curandStatus: " << status; \
|
||||
} \
|
||||
}
|
||||
|
||||
} // namespace gpu
|
||||
} // namespace device
|
||||
} // namespace mindspore
|
||||
|
|
|
@ -25,7 +25,6 @@
|
|||
namespace mindspore {
|
||||
namespace device {
|
||||
namespace gpu {
|
||||
|
||||
MPIInitializer &MPIInitializer::GetInstance() {
|
||||
static MPIInitializer instance;
|
||||
return instance;
|
||||
|
|
|
@ -58,7 +58,23 @@ int AnfConverter::ValidateFileStr(const std::string &modelFile, std::string file
|
|||
|
||||
bool AnfConverter::ReadOnnxFromBinary(const std::string &modelFile, google::protobuf::Message *onnx_model) {
|
||||
std::unique_ptr<char> onnx_file(new (std::nothrow) char[PATH_MAX]{0});
|
||||
int fd = open(modelFile.c_str(), O_RDONLY);
|
||||
if (modelFile.size() > PATH_MAX) {
|
||||
MS_LOG(DEBUG) << "file path " << modelFile << " is too long.";
|
||||
return false;
|
||||
}
|
||||
char real_path[PATH_MAX + 1] = {0};
|
||||
#if defined(_WIN32) || defined(_WIN64)
|
||||
if (nullptr == _fullpath(real_path, modelFile.c_str(), PATH_MAX)) {
|
||||
MS_LOG(DEBUG) << modelFile << " does not exit.";
|
||||
return false;
|
||||
}
|
||||
#else
|
||||
if (nullptr == realpath(modelFile.c_str(), real_path)) {
|
||||
MS_LOG(DEBUG) << modelFile << " does not exit.";
|
||||
return false;
|
||||
}
|
||||
#endif
|
||||
int fd = open(real_path, O_RDONLY);
|
||||
if (fd < 0) {
|
||||
MS_LOG(EXCEPTION) << "failed to open file";
|
||||
}
|
||||
|
|
|
@ -100,28 +100,27 @@ bool ProcessInner(const std::string &message, const int &bias, uint32_t *digest,
|
|||
w[i] = sigma3(w[i - 2]) + w[i - 7] + sigma2(w[i - 15]) + w[i - 16];
|
||||
}
|
||||
|
||||
std::shared_ptr<uint32_t> hash(new uint32_t[digest_size], std::default_delete<uint32_t[]>());
|
||||
std::shared_ptr<uint32_t[]> hash(new uint32_t[digest_size]());
|
||||
size_t mem_size = digest_size * sizeof(uint32_t);
|
||||
auto ret = memcpy_s(hash.get(), mem_size, digest, mem_size);
|
||||
if (ret != EOK) {
|
||||
return false;
|
||||
}
|
||||
for (int i = 0; i < kIterationNumber; ++i) {
|
||||
uint32_t t1 =
|
||||
w[i] + constant[i] + hash.get()[7] + sigma1(hash.get()[4]) + ch(hash.get()[4], hash.get()[5], hash.get()[6]);
|
||||
uint32_t t2 = sigma0(hash.get()[0]) + ma(hash.get()[0], hash.get()[1], hash.get()[2]);
|
||||
uint32_t t1 = w[i] + constant[i] + hash[7] + sigma1(hash[4]) + ch(hash[4], hash[5], hash[6]);
|
||||
uint32_t t2 = sigma0(hash[0]) + ma(hash[0], hash[1], hash[2]);
|
||||
for (int j = digest_size - 1; j >= 0; --j) {
|
||||
if (j == 4) {
|
||||
hash.get()[j] = hash.get()[j - 1] + t1;
|
||||
hash[j] = hash[j - 1] + t1;
|
||||
} else if (j == 0) {
|
||||
hash.get()[j] = t1 + t2;
|
||||
hash[j] = t1 + t2;
|
||||
} else {
|
||||
hash.get()[j] = hash.get()[j - 1];
|
||||
hash[j] = hash[j - 1];
|
||||
}
|
||||
}
|
||||
}
|
||||
for (int i = 0; i < digest_size; ++i) {
|
||||
digest[i] += hash.get()[i];
|
||||
digest[i] += hash[i];
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
|
|
@ -219,6 +219,8 @@ constexpr auto kFusedScaleApplyMomentum = "FusedScaleApplyMomentum";
|
|||
constexpr auto kBasicLSTMCellWeightGradOpName = "BasicLSTMCellWeightGrad";
|
||||
constexpr auto kBasicLSTMCellInputGradOpName = "BasicLSTMCellInputGrad";
|
||||
constexpr auto kBasicLSTMCellOpName = "BasicLSTMCell";
|
||||
constexpr auto kDynamicRNNOpName = "DynamicRNN";
|
||||
constexpr auto kLSTMInputGradOpName = "LSTMInputGrad";
|
||||
|
||||
// attr key name
|
||||
constexpr auto kAttrInputNames = "input_names";
|
||||
|
|
|
@ -292,7 +292,6 @@ class _PynativeExecutor:
|
|||
def __init__(self):
|
||||
self._executor = PynativeExecutor_.get_instance()
|
||||
|
||||
#TODO(kpy):add a type arg
|
||||
def new_graph(self, obj, *args, **kwargs):
|
||||
self._executor.new_graph(obj, *args, *(kwargs.values()))
|
||||
|
||||
|
|
|
@ -31,9 +31,11 @@ def set_seed(seed):
|
|||
Note:
|
||||
The global seed is used by numpy.random, mindspore.common.Initializer, mindspore.ops.composite.random_ops and
|
||||
mindspore.nn.probability.distribution.
|
||||
|
||||
If global seed is not set, these packages will use their own default seed independently, numpy.random and
|
||||
mindspore.common.Initializer will choose a random seed, mindspore.ops.composite.random_ops and
|
||||
mindspore.nn.probability.distribution will use zero.
|
||||
|
||||
Seed set by numpy.random.seed() only used by numpy.random, while seed set by this API will also used by
|
||||
numpy.random, so just set all seed by this API is recommended.
|
||||
|
||||
|
@ -97,14 +99,13 @@ def _get_op_seed(op_seed, kernel_name):
|
|||
seed (int): The op-seed to be updated.
|
||||
kernel_name (string): The random op kernel.
|
||||
"""
|
||||
if ((kernel_name, op_seed) not in _KERNEL_SEED) or (_KERNEL_SEED[(kernel_name, op_seed)] == -1):
|
||||
if (kernel_name, op_seed) not in _KERNEL_SEED:
|
||||
_KERNEL_SEED[(kernel_name, op_seed)] = op_seed
|
||||
_KERNEL_SEED[(kernel_name, op_seed)] = 0
|
||||
return _KERNEL_SEED[(kernel_name, op_seed)]
|
||||
|
||||
def _reset_op_seed():
|
||||
"""
|
||||
Reset op seeds in the kernel's dictionary.
|
||||
"""
|
||||
for key in _KERNEL_SEED:
|
||||
_KERNEL_SEED[key] = -1
|
||||
for (kernel_name, op_seed) in _KERNEL_SEED:
|
||||
_KERNEL_SEED[(kernel_name, op_seed)] = op_seed
|
||||
|
|
|
@ -104,6 +104,9 @@ inline const PrimitivePtr kPrimPad = std::make_shared<Primitive>("Pad");
|
|||
inline const PrimitivePtr kPrimArgMaxWithValue = std::make_shared<Primitive>("ArgMaxWithValue");
|
||||
inline const PrimitivePtr kPrimUnique = std::make_shared<Primitive>("Unique");
|
||||
inline const PrimitivePtr kPrimUniqueGrad = std::make_shared<Primitive>("UniqueGrad");
|
||||
inline const PrimitivePtr kPrimExtractImagePatches = std::make_shared<Primitive>("ExtractImagePatches");
|
||||
inline const PrimitivePtr kPrimDynamicRNN = std::make_shared<Primitive>("DynamicRNN");
|
||||
inline const PrimitivePtr kPrimDynamicRNNGrad = std::make_shared<Primitive>("DynamicRNNGrad");
|
||||
|
||||
// NN
|
||||
inline const PrimitivePtr kPrimFlatten = std::make_shared<Primitive>("Flatten");
|
||||
|
@ -213,6 +216,7 @@ inline const PrimitivePtr kPrimRound = std::make_shared<Primitive>("Round");
|
|||
inline const PrimitivePtr kPrimExp = std::make_shared<Primitive>("Exp");
|
||||
inline const PrimitivePtr kPrimLog = std::make_shared<Primitive>("Log");
|
||||
inline const PrimitivePtr kPrimRsqrt = std::make_shared<Primitive>("Rsqrt");
|
||||
inline const PrimitivePtr kPrimSplitV = std::make_shared<Primitive>("SplitV");
|
||||
|
||||
// Statements
|
||||
inline const PrimitivePtr kPrimReturn = std::make_shared<Primitive>("return");
|
||||
|
|
|
@ -28,7 +28,6 @@
|
|||
#include <algorithm>
|
||||
#include <type_traits>
|
||||
#include <typeinfo>
|
||||
#include <regex>
|
||||
|
||||
#include "abstract/abstract_value.h"
|
||||
|
||||
|
@ -373,25 +372,17 @@ class TensorDataImpl : public TensorData {
|
|||
std::is_same<T, double>::value) {
|
||||
return str;
|
||||
}
|
||||
// Use regular expressions to replace placeholders.
|
||||
std::regex pattern("#+");
|
||||
std::smatch result;
|
||||
int bias = 0;
|
||||
std::string::const_iterator start = str.begin();
|
||||
std::string::const_iterator end = str.end();
|
||||
while (std::regex_search(start, end, result, pattern)) {
|
||||
const int len = result.str(0).length();
|
||||
const int pos = result.position();
|
||||
bias += pos;
|
||||
if (bias > static_cast<int32_t>(str.length())) {
|
||||
return "";
|
||||
}
|
||||
// Replace # with placeholder.
|
||||
size_t index = str.find('#');
|
||||
while (index != str.npos) {
|
||||
size_t pos = index;
|
||||
while (str[pos] == '#') {
|
||||
pos++;
|
||||
}
|
||||
int len = pos - index;
|
||||
std::string space(max_width - len, ' ');
|
||||
str = str.replace(bias, len, space);
|
||||
// Update the starting position of the search.
|
||||
start = str.begin() + bias;
|
||||
end = str.end();
|
||||
str = str.replace(index, len, space);
|
||||
index = str.find('#', index);
|
||||
}
|
||||
return str;
|
||||
}
|
||||
|
|
|
@ -269,7 +269,7 @@ class Tensor : public MetaTensor {
|
|||
|
||||
std::string id() const { return id_; }
|
||||
TypePtr cast_dtype() { return cast_dtype_; }
|
||||
void set_cast_dtype(TypePtr dtype) { cast_dtype_ = dtype; }
|
||||
void set_cast_dtype(TypePtr dtype = nullptr) { cast_dtype_ = dtype; }
|
||||
|
||||
void SetNeedWait(bool need_wait) {
|
||||
if (event_ != nullptr) {
|
||||
|
|
|
@ -266,7 +266,7 @@ std::string KeywordArg::ToString() const {
|
|||
buffer << "KeywordArg[";
|
||||
buffer << "key : " << key_;
|
||||
MS_EXCEPTION_IF_NULL(value_);
|
||||
buffer << "value : " << value_->ToString();
|
||||
buffer << ", value : " << value_->ToString();
|
||||
buffer << "]";
|
||||
return buffer.str();
|
||||
}
|
||||
|
|
|
@ -2474,7 +2474,7 @@ class ConcatDataset(DatasetOp):
|
|||
raise ValueError("The parameter num_shards of DistributedSampler must be positive int!")
|
||||
|
||||
if sampler.get_num_samples() is not None:
|
||||
raise ValueError("The parameter num_samples of DistributedSampler must be set!")
|
||||
raise ValueError("The parameter num_samples of DistributedSampler is not support to be set!")
|
||||
|
||||
self._sampler = _select_sampler(None, sampler, None, None, None)
|
||||
cumulative_samples_nums = 0
|
||||
|
@ -3256,14 +3256,13 @@ class SamplerFn:
|
|||
# Event for end of epoch
|
||||
if multi_process is True:
|
||||
self.eoe = multiprocessing.Event()
|
||||
self.eof = multiprocessing.Event()
|
||||
else:
|
||||
self.eoe = threading.Event()
|
||||
self.eof = threading.Event()
|
||||
# Create workers
|
||||
for _ in range(num_worker):
|
||||
if multi_process is True:
|
||||
worker = _GeneratorWorkerMp(dataset, self.eoe, self.eof)
|
||||
worker = _GeneratorWorkerMp(dataset, self.eoe)
|
||||
else:
|
||||
worker = _GeneratorWorkerMt(dataset, self.eoe, self.eof)
|
||||
worker.daemon = True
|
||||
|
@ -3304,15 +3303,40 @@ class SamplerFn:
|
|||
|
||||
def __del__(self):
|
||||
self.eoe.set()
|
||||
self.eof.set()
|
||||
if self.multi_process is False:
|
||||
self.eof.set()
|
||||
for w in self.workers:
|
||||
w.join()
|
||||
|
||||
|
||||
def _generator_worker_loop(dataset, idx_queue, result_queue, eoe, eof):
|
||||
def _generator_worker_loop_mp(dataset, idx_queue, result_queue, eoe):
|
||||
"""
|
||||
Multiprocessing or multithread generator worker process loop.
|
||||
Multiprocessing generator worker process loop
|
||||
"""
|
||||
while True:
|
||||
# Fetch index, block
|
||||
try:
|
||||
idx = idx_queue.get()
|
||||
except KeyboardInterrupt:
|
||||
raise Exception("Generator worker receives KeyboardInterrupt")
|
||||
if idx is None:
|
||||
# When the queue is out of scope from master process, a None item can be fetched from the queue.
|
||||
# Upon receiving None, worker process should check if EOE is set.
|
||||
assert eoe.is_set(), ""
|
||||
return
|
||||
# Fetch data, any exception from __getitem__ will terminate worker and timeout master process
|
||||
result = dataset[idx]
|
||||
# Send data, block
|
||||
try:
|
||||
result_queue.put(result)
|
||||
except KeyboardInterrupt:
|
||||
raise Exception("Generator worker receives KeyboardInterrupt")
|
||||
del result, idx
|
||||
|
||||
|
||||
def _generator_worker_loop_mt(dataset, idx_queue, result_queue, eoe, eof):
|
||||
"""
|
||||
Multithread generator worker process loop.
|
||||
"""
|
||||
while True:
|
||||
# Fetch index, block
|
||||
|
@ -3360,7 +3384,7 @@ class _GeneratorWorkerMt(threading.Thread):
|
|||
def __init__(self, dataset, eoe, eof):
|
||||
self.idx_queue = queue.Queue(16)
|
||||
self.res_queue = queue.Queue(16)
|
||||
super().__init__(target=_generator_worker_loop, args=(dataset, self.idx_queue, self.res_queue, eoe, eof))
|
||||
super().__init__(target=_generator_worker_loop_mt, args=(dataset, self.idx_queue, self.res_queue, eoe, eof))
|
||||
|
||||
def put(self, item):
|
||||
"""
|
||||
|
@ -3372,7 +3396,7 @@ class _GeneratorWorkerMt(threading.Thread):
|
|||
"""
|
||||
Get function for worker result queue. Block with timeout.
|
||||
"""
|
||||
return self.res_queue.get(timeout=10)
|
||||
return self.res_queue.get(timeout=30)
|
||||
|
||||
|
||||
class _GeneratorWorkerMp(multiprocessing.Process):
|
||||
|
@ -3380,10 +3404,10 @@ class _GeneratorWorkerMp(multiprocessing.Process):
|
|||
Worker process for multiprocess Generator.
|
||||
"""
|
||||
|
||||
def __init__(self, dataset, eoe, eof):
|
||||
def __init__(self, dataset, eoe):
|
||||
self.idx_queue = multiprocessing.Queue(16)
|
||||
self.res_queue = multiprocessing.Queue(16)
|
||||
super().__init__(target=_generator_worker_loop, args=(dataset, self.idx_queue, self.res_queue, eoe, eof))
|
||||
super().__init__(target=_generator_worker_loop_mp, args=(dataset, self.idx_queue, self.res_queue, eoe))
|
||||
|
||||
def put(self, item):
|
||||
"""
|
||||
|
@ -3395,17 +3419,18 @@ class _GeneratorWorkerMp(multiprocessing.Process):
|
|||
"""
|
||||
Get function for worker result queue. Block with timeout.
|
||||
"""
|
||||
while check_iterator_cleanup() is False:
|
||||
try:
|
||||
return self.res_queue.get(timeout=10)
|
||||
except multiprocessing.TimeoutError:
|
||||
continue
|
||||
|
||||
raise Exception("Generator worker process timeout")
|
||||
# Relax 10s to 30s, since it sometimes will cause "Generator worker process timeout"
|
||||
# when we run too many iterators with infinite epoch(num_epoch=-1)
|
||||
return self.res_queue.get(timeout=30)
|
||||
|
||||
|
||||
def __del__(self):
|
||||
# Try to destruct here, sometimes the class itself will be destructed in advance,
|
||||
# so "self" will be a NoneType
|
||||
try:
|
||||
self.terminate()
|
||||
except AttributeError:
|
||||
pass
|
||||
|
||||
|
||||
class GeneratorDataset(MappableDataset):
|
||||
|
|
|
@ -86,6 +86,38 @@ class Compose:
|
|||
>>> py_vision.RandomErasing()])
|
||||
>>> # apply the transform to the dataset through dataset.map()
|
||||
>>> dataset = dataset.map(operations=transform, input_columns="image")
|
||||
>>>
|
||||
>>> # Compose is also be invoked implicitly, by just passing in a list of ops
|
||||
>>> # the above example then becomes:
|
||||
>>> transform_list = [py_vision.Decode(),
|
||||
>>> py_vision.RandomHorizontalFlip(0.5),
|
||||
>>> py_vision.ToTensor(),
|
||||
>>> py_vision.Normalize((0.491, 0.482, 0.447), (0.247, 0.243, 0.262)),
|
||||
>>> py_vision.RandomErasing()]
|
||||
>>>
|
||||
>>> # apply the transform to the dataset through dataset.map()
|
||||
>>> dataset = dataset.map(operations=transform_list, input_columns="image")
|
||||
>>>
|
||||
>>> # Certain C++ and Python ops can be combined, but not all of them
|
||||
>>> # An example of combined operations
|
||||
>>> import mindspore.dataset as ds
|
||||
>>> import mindspore.dataset.transforms.c_transforms as c_transforms
|
||||
>>> import mindspore.dataset.vision.c_transforms as c_vision
|
||||
>>>
|
||||
>>> data = ds.NumpySlicesDataset(arr, column_names=["cols"], shuffle=False)
|
||||
>>> transformed_list = [py_transforms.OneHotOp(2), c_transforms.Mask(c_transforms.Relational.EQ, 1)]
|
||||
>>> data = data.map(operations=transformed_list, input_columns=["cols"])
|
||||
>>>
|
||||
>>> # Here is an example of mixing vision ops
|
||||
>>> data_dir = "/path/to/imagefolder_directory"
|
||||
>>> data1 = ds.ImageFolderDataset(dataset_dir=data_dir, shuffle=False)
|
||||
>>> input_columns = ["column_names"]
|
||||
>>> op_list=[c_vision.Decode(),
|
||||
>>> c_vision.Resize((224, 244)),
|
||||
>>> py_vision.ToPIL(),
|
||||
>>> np.array, # need to convert PIL image to a NumPy array to pass it to C++ operation
|
||||
>>> c_vision.Resize((24, 24))]
|
||||
>>> data1 = data1.map(operations=op_list, input_columns=input_columns)
|
||||
"""
|
||||
|
||||
@check_compose_list
|
||||
|
|
|
@ -28,10 +28,10 @@ if (SUPPORT_TRAIN)
|
|||
elseif (PLATFORM_ARM32)
|
||||
set(COMPONENT_NAME runtime-arm32-${PROCESS_UNIT}-train)
|
||||
elseif (WIN32)
|
||||
set(PARSER_NAME libconverter-parser-win-${PROCESS_UNIT})
|
||||
set(COMPONENT_NAME converter-win-${PROCESS_UNIT})
|
||||
set(PARSER_NAME libconverter-parser-win-${PROCESS_UNIT}-train)
|
||||
set(COMPONENT_NAME converter-win-${PROCESS_UNIT}-train)
|
||||
else ()
|
||||
set(COMPONENT_NAME converter-ubuntu)
|
||||
set(COMPONENT_NAME converter-ubuntu-train)
|
||||
endif()
|
||||
set(RUN_X86_COMPONENT_NAME runtime-x86-${PROCESS_UNIT}-train)
|
||||
else ()
|
||||
|
|
|
@ -0,0 +1,113 @@
|
|||
#ifdef __arm__
|
||||
#ifndef __aarch64__
|
||||
|
||||
.text
|
||||
.align 5
|
||||
.global ConvDwInt8PostAlign4PerChannel
|
||||
#ifndef __APPLE__
|
||||
.type ConvDwInt8PostAlign4PerChannel, %function
|
||||
#endif
|
||||
|
||||
// void ConvDwInt8PostAlign4PerChannel(int8_t *dst, int32_t *buffer, int channel4, int32_t output_zp, int32_t *out_multiplier,
|
||||
// int32_t *left_shift, int32_t *right_shift, int32_t acc_min, int32_t acc_max);
|
||||
// r0: dst, r1: buffer, r2: num_pixels, r3: output_zp, r4: out_multiplier,
|
||||
// r5: left_shift, r6: right_shift, r7: acc_min, r8: acc_max
|
||||
|
||||
ConvDwInt8PostAlign4PerChannel:
|
||||
// at return, clang generates "push {lr}, pop {pc}"" while gcc will generate "bx lr"
|
||||
// according to https://stackoverflow.com/questions/53625807
|
||||
// even if we jump to link register instead of saving it, we still have to save it in subroutine calls anyway
|
||||
// clang's rule seems more simple, though there are no subroutine calls here
|
||||
// r4-r8 and q4-q7 must be saved according to https://static.docs.arm.com/ihi0042/i/aapcs32.pdf
|
||||
push {r4-r8, r10}
|
||||
vpush {q4-q7}
|
||||
add sp, sp, #88
|
||||
|
||||
vdup.32 q15, r3 // output_zp
|
||||
|
||||
ldr r4, [sp] // out_multiplier
|
||||
ldr r5, [sp, #4] // left_shift
|
||||
ldr r6, [sp, #8] // right_shift
|
||||
|
||||
ldr r7, [sp, #12] // acc_min
|
||||
vdup.32 q11, r7
|
||||
|
||||
ldr r8, [sp, #16] // acc_max
|
||||
vdup.32 q10, r8
|
||||
|
||||
mov r10, r0
|
||||
|
||||
LoopDepth8:
|
||||
cmp r2, #8
|
||||
blt End
|
||||
vld1.32 {q0}, [r1]!
|
||||
vld1.32 {q13}, [r5]!
|
||||
vshl.s32 q0, q0, q13
|
||||
vld1.32 {q14}, [r4]!
|
||||
vqrdmulh.s32 q0, q0, q14
|
||||
vld1.32 {q12}, [r6]!
|
||||
vand q4, q0, q12
|
||||
vshr.s32 q4, q4, #31
|
||||
vqadd.s32 q0, q0, q4
|
||||
vrshl.s32 q0, q0, q12
|
||||
vadd.i32 q0, q0, q15
|
||||
vmax.s32 q0, q0, q11
|
||||
vmin.s32 q0, q0, q10
|
||||
vqmovn.s32 d4, q0
|
||||
|
||||
vld1.32 {q1}, [r1]!
|
||||
vld1.32 {q13}, [r5]!
|
||||
vshl.s32 q1, q1, q13
|
||||
vld1.32 {q14}, [r4]!
|
||||
vqrdmulh.s32 q1, q1, q14
|
||||
vld1.32 {q12}, [r6]!
|
||||
vand q4, q1, q12
|
||||
vshr.s32 q4, q4, #31
|
||||
vqadd.s32 q1, q1, q4
|
||||
vrshl.s32 q1, q1, q12
|
||||
vadd.i32 q1, q1, q15
|
||||
vmax.s32 q1, q1, q11
|
||||
vmin.s32 q1, q1, q10
|
||||
vqmovn.s32 d5, q1
|
||||
vqmovn.s16 d4, q2
|
||||
|
||||
vst1.8 {d4}, [r10]!
|
||||
|
||||
sub r2, r2, #8
|
||||
b LoopDepth8
|
||||
|
||||
LoopDepth4:
|
||||
cmp r2, #4
|
||||
blt End
|
||||
vld1.32 {q0}, [r1]!
|
||||
vld1.32 {q13}, [r5]!
|
||||
vshl.s32 q0, q0, q13
|
||||
vld1.32 {q14}, [r4]!
|
||||
vqrdmulh.s32 q0, q0, q14
|
||||
vld1.32 {q12}, [r6]!
|
||||
vand q4, q0, q12
|
||||
vshr.s32 q4, q4, #31
|
||||
vqadd.s32 q0, q0, q4
|
||||
vrshl.s32 q0, q0, q12
|
||||
vadd.i32 q0, q0, q15
|
||||
vmax.s32 q0, q0, q11
|
||||
vmin.s32 q0, q0, q10
|
||||
|
||||
vqmovn.s32 d0, q0
|
||||
vqmovn.s16 d0, q0
|
||||
|
||||
vst1.8 {d0[0]}, [r10]!
|
||||
vst1.8 {d0[1]}, [r10]!
|
||||
vst1.8 {d0[2]}, [r10]!
|
||||
vst1.8 {d0[3]}, [r10]!
|
||||
|
||||
sub r2, r2, #4
|
||||
b LoopDepth4
|
||||
End:
|
||||
sub sp, sp, #88
|
||||
vpop {q4-q7}
|
||||
pop {r4-r8, r10}
|
||||
bx lr
|
||||
|
||||
#endif
|
||||
#endif
|
|
@ -18,7 +18,7 @@
|
|||
#include <string.h>
|
||||
#include "nnacl/errorcode.h"
|
||||
|
||||
int ExpandDims(float *input_ptr, float *output_ptr, size_t data_size) {
|
||||
int ExpandDims(void *input_ptr, void *output_ptr, size_t data_size) {
|
||||
memcpy(output_ptr, input_ptr, data_size);
|
||||
return NNACL_OK;
|
||||
}
|
||||
|
|
|
@ -27,7 +27,7 @@ typedef struct ExpandDimsParameter {
|
|||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
int ExpandDims(float *input_ptr, float *output_ptr, size_t data_size);
|
||||
int ExpandDims(void *input_ptr, void *output_ptr, size_t data_size);
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
|
|
@ -1,26 +0,0 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
// * http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#include "nnacl/fp32/leaky_relu.h"
|
||||
|
||||
void DoLeakyRelu(float *input, float *output, LeakyReluParameter *param, int task_id) {
|
||||
for (int i = task_id; i < param->input_num_; i += param->op_parameter_.thread_num_) {
|
||||
if (input[i] <= 0) {
|
||||
output[i] = input[i] * param->slope_[0];
|
||||
} else {
|
||||
output[i] = input[i];
|
||||
}
|
||||
}
|
||||
}
|
|
@ -1,30 +0,0 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#ifndef MINDSPORE_LITE_NNACL_PRELU_H_
|
||||
#define MINDSPORE_LITE_NNACL_PRELU_H_
|
||||
|
||||
#include "nnacl/op_base.h"
|
||||
#include "nnacl/leaky_relu_parameter.h"
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
void DoLeakyRelu(float *input, float *output, LeakyReluParameter *prelu_param_, int task_id);
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif // MINDSPORE_LITE_NNACL_PRELU_H_
|
|
@ -35,6 +35,9 @@ void PostFuncInt8C4(const int32_t *in, const int32_t *bias, int8_t *out, size_t
|
|||
#ifdef ENABLE_ARM
|
||||
void ConvDwInt8Row(int32_t *output_ptr, const int8_t *input_ptr, const int16_t *weight_ptr, int num_pixels,
|
||||
int output_channel, int input_step, int8_t input_zp);
|
||||
void ConvDwInt8PostAlign4PerChannel(int8_t *dst, int32_t *buffer, int channel4, int32_t output_zp,
|
||||
int32_t *out_multiplier, int32_t *left_shift, int32_t *right_shift, int32_t acc_min,
|
||||
int32_t acc_max);
|
||||
void ConvDwInt8PostAlign4(int8_t *dst, int32_t *buffer, int num_pixels, int32_t output_zp, int32_t out_multiplier,
|
||||
int32_t left_shift, int32_t right_shift, int32_t acc_min, int32_t acc_max);
|
||||
void IndirectGemmInt16to32_8x4(int32_t *dst, const int16_t *src, const int16_t *weight, size_t ksize, size_t ic8,
|
||||
|
@ -64,9 +67,6 @@ void IndirectGemmInt8_4x4(int8_t *output, const int8_t *input, const int8_t *wei
|
|||
void DeconvDwInt8Center(int32_t *dst, const int16_t *src, const int16_t *weight, size_t height, size_t width,
|
||||
size_t kernel_h, size_t kernel_w, size_t out_h_step, size_t block_channel, size_t in_sh_step,
|
||||
size_t in_sw_step, size_t in_kh_step, size_t in_kw_step);
|
||||
void ConvDwInt8PostAlign4PerChannel(int8_t *dst, int32_t *buffer, int channel4, int32_t output_zp,
|
||||
int32_t *out_multiplier, int32_t *left_shift, int32_t *right_shift, int32_t acc_min,
|
||||
int32_t acc_max);
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
|
|
|
@ -39,7 +39,7 @@ void ConvDwInt8Post(int8_t *dst, int32_t *buffer, int output_w, int channel, int
|
|||
// support perchannel
|
||||
for (int w = 0; w < output_w; w++) {
|
||||
int channel4 = 0;
|
||||
#ifdef ENABLE_ARM64
|
||||
#ifdef ENABLE_ARM
|
||||
channel4 = channel / 4 * 4;
|
||||
ConvDwInt8PostAlign4PerChannel(dst, buffer, channel4, output_zp, out_multiplier, left_shift, right_shift, acc_min,
|
||||
acc_max);
|
||||
|
|
|
@ -17,8 +17,7 @@
|
|||
#include "nnacl/int8/leaky_relu_int8.h"
|
||||
#include "nnacl/errorcode.h"
|
||||
|
||||
int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_prelu_parm, QuantArg *input_quant,
|
||||
int task_id) {
|
||||
int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_prelu_parm, int task_id) {
|
||||
if (quant_prelu_parm == NULL) {
|
||||
return NNACL_NULL_PTR;
|
||||
}
|
||||
|
@ -27,17 +26,12 @@ int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_
|
|||
const float output_inverse_scale = 1.f / output_scale;
|
||||
int output_dim = quant_prelu_parm->input_dim_;
|
||||
|
||||
float scale = quant_prelu_parm->quant_arg.in_args_.scale_ * output_inverse_scale;
|
||||
float bias = -quant_prelu_parm->quant_arg.in_args_.zp_ * scale;
|
||||
for (int i = 0; i < output_dim; i++) {
|
||||
input_quant[i].scale_ = quant_prelu_parm->quant_arg.in_args_.scale_;
|
||||
input_quant[i].zp_ = quant_prelu_parm->quant_arg.in_args_.zp_;
|
||||
}
|
||||
|
||||
for (int i = 0; i < output_dim; i++) {
|
||||
float scale = input_quant[i].scale_ * output_inverse_scale;
|
||||
float bias = -input_quant[i].zp_ * scale;
|
||||
for (int j = task_id; j < quant_prelu_parm->element_num; j += quant_prelu_parm->op_parameter_.thread_num_) {
|
||||
if (inputs[j] <= 0) {
|
||||
int32_t output_tmp = round(inputs[j] * quant_prelu_parm->slope_[0] * scale + bias) + output_zp;
|
||||
int32_t output_tmp = round(inputs[j] * quant_prelu_parm->slope_ * scale + bias) + output_zp;
|
||||
if (output_tmp > 127) {
|
||||
output_ptr[j] = 127;
|
||||
} else if (output_tmp < -128) {
|
||||
|
@ -57,6 +51,5 @@ int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_
|
|||
}
|
||||
}
|
||||
}
|
||||
free(input_quant);
|
||||
return NNACL_OK;
|
||||
}
|
||||
|
|
|
@ -23,8 +23,7 @@
|
|||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_Prelu_parm, QuantArg *input_quant,
|
||||
int task_id);
|
||||
int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_Prelu_parm, int task_id);
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
|
|
@ -1,29 +0,0 @@
|
|||
/**
|
||||
* Copyright 2020 Huawei Technologies Co., Ltd
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
#ifndef MINDSPORE_LITE_NNACL_LEAKY_RELU_PARAMETER_H_
|
||||
#define MINDSPORE_LITE_NNACL_LEAKY_RELU_PARAMETER_H_
|
||||
|
||||
#include "nnacl/op_base.h"
|
||||
|
||||
typedef struct LeakyReluParameter {
|
||||
OpParameter op_parameter_;
|
||||
float *slope_;
|
||||
size_t slope_num_;
|
||||
int input_num_;
|
||||
} LeakyReluParameter;
|
||||
|
||||
#endif // MINDSPORE_LITE_NNACL_LEAKY_RELU_PARAMETER_H_
|
|
@ -252,7 +252,7 @@ typedef struct PowerQuantArg {
|
|||
typedef struct LeakyReluQuantArg {
|
||||
OpParameter op_parameter_;
|
||||
PreluQuantArg quant_arg;
|
||||
float *slope_;
|
||||
float slope_;
|
||||
int64_t axis_;
|
||||
const int *in_shape_;
|
||||
const int *out_shape_;
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue