pr to master #8

Open
m7grui4p8 wants to merge 201 commits from p69201753/mindspore:cpu-kernel-reuse-1 into master
306 changed files with 5992 additions and 2208 deletions

View File

@ -69,6 +69,8 @@ MindSpore offers build options across multiple backends:
| | Ubuntu-aarch64 | ✔️ |
| | EulerOS-x86 | ✔️ |
| | EulerOS-aarch64 | ✔️ |
| | CentOS-x86 | ✔️ |
| | CentOS-aarch64 | ✔️ |
| GPU CUDA 10.1 | Ubuntu-x86 | ✔️ |
| CPU | Ubuntu-x86 | ✔️ |
| | Ubuntu-aarch64 | ✔️ |
@ -79,7 +81,7 @@ For installation using `pip`, take `CPU` and `Ubuntu-x86` build version as an ex
1. Download whl from [MindSpore download page](https://www.mindspore.cn/versions/en), and install the package.
```
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.7.0-beta/MindSpore/cpu/ubuntu_x86/mindspore-0.7.0-cp37-cp37m-linux_x86_64.whl
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/MindSpore/cpu/ubuntu_x86/mindspore-1.0.0-cp37-cp37m-linux_x86_64.whl
```
2. Run the following command to verify the install.
@ -136,8 +138,8 @@ currently the containerized build options are supported as follows:
For `CPU` backend, you can directly pull and run the latest stable image using the below command:
```
docker pull mindspore/mindspore-cpu:0.7.0-beta
docker run -it mindspore/mindspore-cpu:0.7.0-beta /bin/bash
docker pull mindspore/mindspore-cpu:1.0.0
docker run -it mindspore/mindspore-cpu:1.0.0 /bin/bash
```
* GPU
@ -170,8 +172,8 @@ currently the containerized build options are supported as follows:
```
Then you can pull and run the latest stable image using the below command:
```
docker pull mindspore/mindspore-gpu:0.7.0-beta
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:0.7.0-beta /bin/bash
docker pull mindspore/mindspore-gpu:1.0.0
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:1.0.0 /bin/bash
```
To test if the docker image works, please execute the python code below and check the output:
@ -206,7 +208,7 @@ please check out [docker](docker/README.md) repo for the details.
## Quickstart
See the [Quick Start](https://www.mindspore.cn/tutorial/training/en/master/quick_start/quick_start.html)
See the [Quick Start](https://www.mindspore.cn/tutorial/training/en/master/quick_start/quick_start.html)
to implement the image classification.
## Docs

View File

@ -66,6 +66,8 @@ MindSpore提供跨多个后端的构建选项
| | Ubuntu-aarch64 | ✔️ |
| | EulerOS-x86 | ✔️ |
| | EulerOS-aarch64 | ✔️ |
| | CentOS-x86 | ✔️ |
| | CentOS-aarch64 | ✔️ |
| GPU CUDA 10.1 | Ubuntu-x86 | ✔️ |
| CPU | Ubuntu-x86 | ✔️ |
| | Ubuntu-aarch64 | ✔️ |
@ -76,7 +78,7 @@ MindSpore提供跨多个后端的构建选项
1. 请从[MindSpore下载页面](https://www.mindspore.cn/versions)下载并安装whl包。
```
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.7.0-beta/MindSpore/cpu/ubuntu_x86/mindspore-0.7.0-cp37-cp37m-linux_x86_64.whl
pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/MindSpore/cpu/ubuntu_x86/mindspore-1.0.0-cp37-cp37m-linux_x86_64.whl
```
2. 执行以下命令,验证安装结果。
@ -132,8 +134,8 @@ MindSpore的Docker镜像托管在[Docker Hub](https://hub.docker.com/r/mindspore
对于`CPU`后端,可以直接使用以下命令获取并运行最新的稳定镜像:
```
docker pull mindspore/mindspore-cpu:0.7.0-beta
docker run -it mindspore/mindspore-cpu:0.7.0-beta /bin/bash
docker pull mindspore/mindspore-cpu:1.0.0
docker run -it mindspore/mindspore-cpu:1.0.0 /bin/bash
```
* GPU
@ -166,8 +168,8 @@ MindSpore的Docker镜像托管在[Docker Hub](https://hub.docker.com/r/mindspore
```
使用以下命令获取并运行最新的稳定镜像:
```
docker pull mindspore/mindspore-gpu:0.7.0-beta
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:0.7.0-beta /bin/bash
docker pull mindspore/mindspore-gpu:1.0.0
docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:1.0.0 /bin/bash
```
要测试Docker是否正常工作请运行下面的Python代码并检查输出

View File

@ -1,3 +1,112 @@
# Release 1.0.0
## Major Features and Improvements
### MindSpore Training and Inference Framework
#### Ascend 910
* New models
* DenseNet121: a dense convolutional neural network, which connects each layer to every other layer in a feed-forward fashion for object recognition on ImageNet dataset.
* UNet2D-Medical: Unet Medical model for 2D image segmentation, Convolutional Networks for Biomedical Image Segmentation on ISBI Challenge database.
* Frontend and user interface
* Second-Order Optimization
* Enable second-order optimization for Bert on Ascend 910, which can achieve a masked lm accuracy of 71.3% in 800 seconds using 8 Ascend 910 (Bert-Large @MLPerf v0.7 dataset).
* New GNN model BGCF
* Bayesian Graph Convolutional Filtering network which naturally incorporate the uncertainty in the user-item interaction graph shows excellent recommendation performance on Amazon-Beauty dataset.
* Add append interface for SequentialCell.
* Add a level `auto` for AMP.
* Executor and performance optimization
* Support quantitative network (Resnet50 & YoloV3 & MobileNetV2).
* Project ease of use optimization: project compilation time optimization, CMakelist regularization, cudnn, cuda independent compilation and installation independent.
* Data processing, augmentation, and save format
* Support GeneratorDataset return string type
#### Other Hardware Support
* GPU platform
* Enable second-order optimization for resnet50 on GPU, which achieve 30% improvement on training time compared to SGD with Momentum (Resnet50 @ImageNet).
* CPU platform
* ...
#### User interfaces change log
* Remove global object GradOperation in Autodiff([!5011](https://gitee.com/mindspore/mindspore/pulls/5011))
* Remove useless attribute 'name' in Autodiff([!5172](https://gitee.com/mindspore/mindspore/pulls/5172))
* Rectification distributed init([!5350](https://gitee.com/mindspore/mindspore/pulls/5350))
* Move the setting of ParalleMode from train.parallel_utils to context([!5351](https://gitee.com/mindspore/mindspore/pulls/5351))
* Modification of save_checkpoint([!5482](https://gitee.com/mindspore/mindspore/pulls/5482))
* Wrap numpy random seed into an api([!5634](https://gitee.com/mindspore/mindspore/pulls/5634))
* Delete enable_fused_layernorm in some modelzoo scripts([!5665](https://gitee.com/mindspore/mindspore/pulls/5665))
* Move 'multi-subgraphs' interface to internal([!5696](https://gitee.com/mindspore/mindspore/pulls/5696))
* Rename mirror_mean to gradient_mean([!5700](https://gitee.com/mindspore/mindspore/pulls/5700))
* Remove default value of 'group' of DepthWiseConv2d([!5865](https://gitee.com/mindspore/mindspore/pulls/5865))
* Modify interface for function and remove duplicated def([!5958](https://gitee.com/mindspore/mindspore/pulls/5958))
* Unify Conv2d and DepthwiseConv2d([!5916](https://gitee.com/mindspore/mindspore/pulls/5916))
* Modification of SoftmaxCrossEntropyWithLogits([!5502](https://gitee.com/mindspore/mindspore/pulls/5502))
* Change API set_strategy() to shard()([!5991](https://gitee.com/mindspore/mindspore/pulls/5991))
* Move batch_size from bert_cfg_cfg to cfg([!6233](https://gitee.com/mindspore/mindspore/pulls/6233))
* Remove unused parameters from SummaryRecord __init__([!5548](https://gitee.com/mindspore/mindspore/pulls/5548))
* remove sens parameter of TrainOneStepWithLossScaleCell([!5753](https://gitee.com/mindspore/mindspore/pulls/5753))
* optimize the TrainOneStepCell for user's define([!6159](https://gitee.com/mindspore/mindspore/pulls/6159))
* delete seed0 and seed1 of nn.Dropout([!5735](https://gitee.com/mindspore/mindspore/pulls/5735))
* delete DataWrapper([!6101](https://gitee.com/mindspore/mindspore/pulls/6101))
* LSTM API optimization([!6374](https://gitee.com/mindspore/mindspore/pulls/6374))
* Merge P\C\F of ops([!5645](https://gitee.com/mindspore/mindspore/pulls/5645))
* delete SoftmaxCrossEntropyExpand interface([!6607](https://gitee.com/mindspore/mindspore/pulls/6607))
* Adjust GroupNorm interface([!6329](https://gitee.com/mindspore/mindspore/pulls/6329))
* Modify init interface to internal interface([!6651](https://gitee.com/mindspore/mindspore/pulls/6651))
* Log optimization([!5842](https://gitee.com/mindspore/mindspore/pulls/5842))
* Remove useless API dataset.set_dataset_size[!5806](https://gitee.com/mindspore/mindspore/pulls/5806))
* Some of Dataset API add usage parameter[!5605](https://gitee.com/mindspore/mindspore/pulls/5605))
* Change the import path, such as from mindspore.dataset.transforms.vision to mindspore.dataset.vision.transforms[!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
* Rename ImageFolderDatasetV2 to ImageFolderDataset[!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
* Dataset.map parameter optimization[!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
* Add new api dataset.get_col_names[!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
* Add new api dataset.get_col_names[!5384](https://gitee.com/mindspore/mindspore/pulls/5384))
* Remove useless API MindRecord finish[!5580](https://gitee.com/mindspore/mindspore/pulls/5580))
### MindSpore Lite
* Converter
* Add 6 TFLite op, 7 Caffe op, 1 ONNX op.
* Add support for Windows.
* Support parallel inference of multiple sessions to adapt to more scenarios
* Support 8bits only weight-quantization, most main-stream models has small accuracy loss (less than 0.5%) when compared to non-qunantized fp32 model.
* CPU & GPU
* Add 20 CPU opsinclude FP32, int8/uint8, FP16 and int32 ops.
* Add supporting FP16 for GPU, add 14 GPU ops include FP32/FP16.
* Add Buffer/Image2D transform op for GPU
* Performance optimization for CPU ops focus on ARM32.
* Performance optimization for GPU Convolution using winograd.
* Tool & example
* Add object detection Android Demo.
## Bugfixes
* Models
* fix the constant folding problem in multiply.([!6092](https://gitee.com/mindspore/mindspore/pulls/6092))
* move batch_size from bert_net_cfg to cfg in bert scripts.([!6233](https://gitee.com/mindspore/mindspore/pulls/6233))
* modify the checkpoint file path.([!6137](https://gitee.com/mindspore/mindspore/pulls/6137))
* Python API
* fix semi auto parallel parameter of reshape has another user([!5722](https://gitee.com/mindspore/mindspore/pulls/5722))
* raise ValueError when call hook function in graph mode([!5831](https://gitee.com/mindspore/mindspore/pulls/5831))
* Executor
* fix pynative mode to build temporary nn objects.[!6189](https://gitee.com/mindspore/mindspore/pulls/6189))
* fix the accuracy problem of multiple inputs of multi-card communication operator broadcast.([!6522](https://gitee.com/mindspore/mindspore/pulls/5622))
* fix the problem that the sample distribution interface categorical does not support graph mode.([!5772](https://gitee.com/mindspore/mindspore/pulls/5772))
* fix the random seed failure problem of the polynomial downsampling distribution operator.([!5948](https://gitee.com/mindspore/mindspore/pulls/5948))
* fix unnecessary address binding issues in GPU heterogeneous scenarios.([!6232](https://gitee.com/mindspore/mindspore/pulls/6232))
* GPU platform
* fix for kernel resource leak([!5315](https://gitee.com/mindspore/mindspore/pulls/5315))
* fix for insufficient memory for continuous unit test running([!5617](https://gitee.com/mindspore/mindspore/pulls/5617))
* fix for the memory leak in the sparse slicer([!5578](https://gitee.com/mindspore/mindspore/pulls/5578))
* Data processing
* fix hang when use pyfunc([!6346](https://gitee.com/mindspore/mindspore/pulls/6346))
* fix GPU device queue does not release GIL during resource clean up([!5964](https://gitee.com/mindspore/mindspore/pulls/5964))
* fix hang if scripte exit unnormally([!6441](https://gitee.com/mindspore/mindspore/pulls/6441))
## Contributors
Thanks goes to these wonderful people:
Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, danish, Danish, dayschan, eric, Eric, fary86, fuzhiye, Gaoxiong, gengdongjie, gongdaguo, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huzhifeng, hwjiaorui, Jesse, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, r1chardf1d0, riemann_penn, root, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhoufeng, zhousiyi, zhouyaqiang, Zichun, Zirui, Ziyan, zjun, ZPaC
Contributions of any kind are welcome!
# Release 0.7.0-beta
## Major Features and Improvements

2
akg

@ -1 +1 @@
Subproject commit 4d897c23fc41c5f7013efd0c517796233671518a
Subproject commit 6c492a8c9d9730ad11ffc5481cc532ae500b0da5

View File

@ -16,12 +16,13 @@
@title mindspore_build
SET BASEPATH=%CD%
IF NOT EXIST "%BASEPATH%/build" (
SET BUILD_PATH=%BASEPATH%/build
IF NOT EXIST "%BUILD_PATH%" (
md "build"
)
cd %BASEPATH%/build
set BUILD_PATH=%CD%
cd %BUILD_PATH%
IF NOT EXIST "%BUILD_PATH%/mindspore" (
md "mindspore"
@ -38,7 +39,7 @@ IF "%1%" == "lite" (
call :run_cmake
IF errorlevel 1 (
echo "cmake fail."
call :run_fail
goto run_fail
)
) ELSE (
call :gene_protobuf
@ -53,7 +54,7 @@ IF "%1%" == "lite" (
)
IF errorlevel 1 (
echo "build fail."
call :run_fail
goto run_fail
) ELSE (
cd %BASEPATH%/output
rd /s /q _CPack_Packages
@ -63,7 +64,7 @@ IF "%1%" == "lite" (
-G "CodeBlocks - MinGW Makefiles" ../..
IF NOT %errorlevel% == 0 (
echo "cmake fail."
call :run_fail
goto run_fail
)
IF "%1%" == "" (
@ -73,7 +74,7 @@ IF "%1%" == "lite" (
)
IF NOT %errorlevel% == 0 (
echo "build fail."
call :run_fail
goto run_fail
)
)
@ -82,10 +83,14 @@ cd %BASEPATH%
goto run_eof
:run_cmake
set VERSION_MAJOR=1
set VERSION_MINOR=0
set VERSION_REVISION=0
echo "======Start building MindSpore Lite %VERSION_MAJOR%.%VERSION_MINOR%.%VERSION_REVISION%======"
cd %BUILD_PATH%/mindspore
cmake -DBUILD_DEVICE=on -DBUILD_CONVERTER=on -DPLATFORM_ARM64=off -DSUPPORT_TRAIN=off ^
-DCMAKE_BUILD_TYPE=Release -DSUPPORT_GPU=off -DBUILD_MINDDATA=off -DOFFLINE_COMPILE=off ^
-DMS_VERSION_MAJOR=0 -DMS_VERSION_MINOR=7 -DMS_VERSION_REVISION=0 ^
-DMS_VERSION_MAJOR=%VERSION_MAJOR% -DMS_VERSION_MINOR=%VERSION_MINOR% -DMS_VERSION_REVISION=%VERSION_REVISION% ^
-G "CodeBlocks - MinGW Makefiles" "%BASEPATH%/mindspore/lite"
GOTO:EOF
@ -123,6 +128,5 @@ GOTO:EOF
:run_fail
cd %BASEPATH%
set errorlevel=1
EXIT
:run_eof

View File

@ -25,7 +25,7 @@ usage()
echo "bash build.sh [-d] [-r] [-v] [-c on|off] [-t on|off] [-g on|off] [-h] [-b ge] [-m infer|train] \\"
echo " [-a on|off] [-p on|off] [-i] [-L] [-R] [-D on|off] [-j[n]] [-e gpu|d|cpu] \\"
echo " [-P on|off] [-z [on|off]] [-M on|off] [-V 9.2|10.1] [-I arm64|arm32|x86_64] [-K] \\"
echo " [-B on|off] [-w on|off] [-E] [-l on|off] [-n full|lite|off] [-T on|off]"
echo " [-B on|off] [-w on|off] [-E] [-l on|off] [-n full|lite|off]"
echo ""
echo "Options:"
echo " -d Debug mode"
@ -58,7 +58,6 @@ usage()
echo " -B Enable debugger, default on"
echo " -E Enable IBVERBS for parameter server, default off"
echo " -l Compile with python dependency, default on"
echo " -T Enable on-device training, default off"
}
# check value of input is 'on' or 'off'
@ -237,7 +236,7 @@ checkopts()
;;
z)
eval ARG=\$\{$OPTIND\}
if [[ -n $ARG && $ARG != -* ]]; then
if [[ -n "$ARG" && "$ARG" != -* ]]; then
OPTARG="$ARG"
check_on_off $OPTARG z
OPTIND=$((OPTIND + 1))

View File

@ -10,10 +10,10 @@ if (CMAKE_SYSTEM_NAME MATCHES "Windows")
MD5 17757c84f49edd42d34ae8c9288110a1)
else()
mindspore_add_pkg(onednn
VER 1.5
VER 1.7
LIBS dnnl mkldnn
URL https://github.com/oneapi-src/oneDNN/archive/v1.5.tar.gz
MD5 5d97e0e8f4c0b37da5f524533b7a644b
URL https://github.com/oneapi-src/oneDNN/archive/v1.7-rc.tar.gz
MD5 f5e3ce761f6521e235817299f882b965
CMAKE_OPTION -DDNNL_ARCH_OPT_FLAGS='' -DDNNL_CPU_RUNTIME='SEQ' -DDNNL_BUILD_EXAMPLES=OFF -DDNNL_BUILD_TESTS=OFF)
endif()

View File

@ -56,7 +56,7 @@ if (PLATFORM_ARM64)
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.so DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.a DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${INC_DIR}/ir/dtype COMPONENT ${COMPONENT_NAME})
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR} COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR} COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "train_model.h" EXCLUDE PATTERN "train_session.h" EXCLUDE)
install(DIRECTORY ${TOP_DIR}/mindspore/lite/schema/ DESTINATION ${INC_DIR}/schema COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "inner" EXCLUDE)
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite-optimize.so DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite-fp16.so DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
@ -65,7 +65,7 @@ elseif (PLATFORM_ARM32)
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.so DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.a DESTINATION ${LIB_DIR} COMPONENT ${COMPONENT_NAME})
install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${INC_DIR}/ir/dtype COMPONENT ${COMPONENT_NAME})
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR} COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR} COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "train_model.h" EXCLUDE PATTERN "train_session.h" EXCLUDE)
install(DIRECTORY ${TOP_DIR}/mindspore/lite/schema/ DESTINATION ${INC_DIR}/schema COMPONENT ${COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "inner" EXCLUDE)
install(DIRECTORY ${TOP_DIR}/third_party/flatbuffers/include DESTINATION ${FLATBF_DIR} COMPONENT ${COMPONENT_NAME})
elseif (CMAKE_SYSTEM_NAME MATCHES "Windows")
@ -75,7 +75,7 @@ elseif (CMAKE_SYSTEM_NAME MATCHES "Windows")
install(FILES ${LIB_LIST} DESTINATION ${TOP_DIR}/build/mindspore/package COMPONENT ${COMPONENT_NAME})
install(FILES ${TOP_DIR}/build/mindspore/tools/converter/libconverter_parser.a DESTINATION ${TOP_DIR}/build/mindspore/package COMPONENT ${PARSER_NAME})
else ()
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR_RUN_X86} COMPONENT ${RUN_X86_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${INC_DIR_RUN_X86} COMPONENT ${RUN_X86_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "train_model.h" EXCLUDE PATTERN "train_session.h" EXCLUDE)
install(DIRECTORY ${TOP_DIR}/mindspore/lite/schema/ DESTINATION ${INC_DIR_RUN_X86}/schema COMPONENT ${RUN_X86_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "inner" EXCLUDE)
install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${INC_DIR_RUN_X86}/ir/dtype COMPONENT ${RUN_X86_COMPONENT_NAME})
install(DIRECTORY ${TOP_DIR}/third_party/flatbuffers/include DESTINATION ${FLATBF_DIR_RUN_X86} COMPONENT ${RUN_X86_COMPONENT_NAME})

View File

@ -0,0 +1,67 @@
FROM ubuntu:18.04
MAINTAINER leonwanghui <leon.wanghui@huawei.com>
# Set env
ENV PYTHON_ROOT_PATH /usr/local/python-3.7.5
ENV PATH /usr/local/bin:$PATH
# Install base tools
RUN apt update \
&& DEBIAN_FRONTEND=noninteractive apt install -y \
vim \
wget \
curl \
xz-utils \
net-tools \
openssh-client \
git \
ntpdate \
tzdata \
tcl \
sudo \
bash-completion
# Install compile tools
RUN DEBIAN_FRONTEND=noninteractive apt install -y \
gcc \
g++ \
zlibc \
make \
libgmp-dev \
patch \
autoconf \
libtool \
automake \
flex
# Set bash
RUN echo "dash dash/sh boolean false" | debconf-set-selections
RUN DEBIAN_FRONTEND=noninteractive dpkg-reconfigure dash
# Install python (v3.7.5)
RUN apt install -y libffi-dev libssl-dev zlib1g-dev libbz2-dev libncurses5-dev \
libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libsqlite3-dev \
&& cd /tmp \
&& wget https://github.com/python/cpython/archive/v3.7.5.tar.gz \
&& tar -xvf v3.7.5.tar.gz \
&& cd /tmp/cpython-3.7.5 \
&& mkdir -p ${PYTHON_ROOT_PATH} \
&& ./configure --prefix=${PYTHON_ROOT_PATH} \
&& make -j4 \
&& make install -j4 \
&& rm -f /usr/local/bin/python \
&& rm -f /usr/local/bin/pip \
&& ln -s ${PYTHON_ROOT_PATH}/bin/python3.7 /usr/local/bin/python \
&& ln -s ${PYTHON_ROOT_PATH}/bin/pip3.7 /usr/local/bin/pip \
&& rm -rf /tmp/cpython-3.7.5 \
&& rm -f /tmp/v3.7.5.tar.gz
# Set pip source
RUN mkdir -pv /root/.pip \
&& echo "[global]" > /root/.pip/pip.conf \
&& echo "trusted-host=mirrors.aliyun.com" >> /root/.pip/pip.conf \
&& echo "index-url=http://mirrors.aliyun.com/pypi/simple/" >> /root/.pip/pip.conf
# Install MindSpore cpu whl package
RUN pip install --no-cache-dir https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/MindSpore/cpu/ubuntu_x86/mindspore-1.0.0-cp37-cp37m-linux_x86_64.whl

View File

@ -0,0 +1,81 @@
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
MAINTAINER leonwanghui <leon.wanghui@huawei.com>
# Set env
ENV PYTHON_ROOT_PATH /usr/local/python-3.7.5
ENV OMPI_ROOT_PATH /usr/local/openmpi-3.1.5
ENV PATH ${OMPI_ROOT_PATH}/bin:/usr/local/bin:$PATH
ENV LD_LIBRARY_PATH ${OMPI_ROOT_PATH}/lib:$LD_LIBRARY_PATH
# Install base tools
RUN apt update \
&& DEBIAN_FRONTEND=noninteractive apt install -y \
vim \
wget \
curl \
xz-utils \
net-tools \
openssh-client \
git \
ntpdate \
tzdata \
tcl \
sudo \
bash-completion
# Install compile tools
RUN DEBIAN_FRONTEND=noninteractive apt install -y \
gcc \
g++ \
zlibc \
make \
libgmp-dev \
patch \
autoconf \
libtool \
automake \
flex
# Set bash
RUN echo "dash dash/sh boolean false" | debconf-set-selections
RUN DEBIAN_FRONTEND=noninteractive dpkg-reconfigure dash
# Install python (v3.7.5)
RUN apt install -y libffi-dev libssl-dev zlib1g-dev libbz2-dev libncurses5-dev \
libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libsqlite3-dev \
&& cd /tmp \
&& wget https://github.com/python/cpython/archive/v3.7.5.tar.gz \
&& tar -xvf v3.7.5.tar.gz \
&& cd /tmp/cpython-3.7.5 \
&& mkdir -p ${PYTHON_ROOT_PATH} \
&& ./configure --prefix=${PYTHON_ROOT_PATH} \
&& make -j4 \
&& make install -j4 \
&& rm -f /usr/local/bin/python \
&& rm -f /usr/local/bin/pip \
&& ln -s ${PYTHON_ROOT_PATH}/bin/python3.7 /usr/local/bin/python \
&& ln -s ${PYTHON_ROOT_PATH}/bin/pip3.7 /usr/local/bin/pip \
&& rm -rf /tmp/cpython-3.7.5 \
&& rm -f /tmp/v3.7.5.tar.gz
# Set pip source
RUN mkdir -pv /root/.pip \
&& echo "[global]" > /root/.pip/pip.conf \
&& echo "trusted-host=mirrors.aliyun.com" >> /root/.pip/pip.conf \
&& echo "index-url=http://mirrors.aliyun.com/pypi/simple/" >> /root/.pip/pip.conf
# Install openmpi (v3.1.5)
RUN cd /tmp \
&& wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.5.tar.gz \
&& tar -xvf openmpi-3.1.5.tar.gz \
&& cd /tmp/openmpi-3.1.5 \
&& mkdir -p ${OMPI_ROOT_PATH} \
&& ./configure --prefix=${OMPI_ROOT_PATH} \
&& make -j4 \
&& make install -j4 \
&& rm -rf /tmp/openmpi-3.1.5 \
&& rm -f /tmp/openmpi-3.1.5.tar.gz
# Install MindSpore cuda-10.1 whl package
RUN pip install --no-cache-dir https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/MindSpore/gpu/ubuntu_x86/cuda-10.1/mindspore_gpu-1.0.0-cp37-cp37m-linux_x86_64.whl

@ -1 +1 @@
Subproject commit 4f6ebe0f92da8c758b1b71a2bc7ff33f9344e18a
Subproject commit 14db109491bc81473905a5eb9e82f6234aca419b

View File

@ -74,7 +74,11 @@ class InferTensorBase {
if (data_len == 0) {
return true;
}
memcpy_s(mutable_data(), data_size(), data, data_len);
auto ret = memcpy_s(mutable_data(), data_size(), data, data_len);
if (ret != 0) {
MSI_LOG_ERROR << "Set data memcpy_s failed";
return false;
}
return true;
}

View File

@ -81,7 +81,6 @@ class MS_API InferSession {
}
static std::shared_ptr<InferSession> CreateSession(const std::string &device, uint32_t device_id);
};
} // namespace inference
} // namespace mindspore
#endif // MINDSPORE_INCLUDE_MS_SESSION_H

View File

@ -13,7 +13,6 @@
# limitations under the License.
# ============================================================================
"""tbe common"""
import json
import os
class TBEException(Exception):
@ -27,23 +26,6 @@ class TBEException(Exception):
return self.__error_msg
def get_ddk_version():
"""get ddk version"""
ddk_version = os.environ.get("DDK_VERSION")
if ddk_version is None:
default_ddk_info_file = '/usr/local/HiAI/runtime/ddk_info'
backup_ddk_info_file = '/usr/local/Ascend/fwkacllib/ddk_info'
if os.path.exists(default_ddk_info_file):
with open(default_ddk_info_file, "r") as fp:
ddk_version = json.load(fp)["VERSION"]
elif os.path.exists(backup_ddk_info_file):
with open(backup_ddk_info_file, "r") as fp:
ddk_version = json.load(fp)["VERSION"]
else:
ddk_version = "Ascend910"
return ddk_version
def get_build_in_impl_path():
"""get build-in tbe implement path"""
tbe_impl_path = os.environ.get("TBE_IMPL_PATH")

View File

@ -18,9 +18,8 @@ import os
import sys
from te.platform.cce_conf import te_set_version
from te.platform.fusion_util import fusion_op
from common import check_kernel_info, get_args, get_build_in_impl_path, get_ddk_version
from common import check_kernel_info, get_args, get_build_in_impl_path
ddk_version = get_ddk_version()
build_in_impl_path = get_build_in_impl_path()
# op function list
@ -30,7 +29,6 @@ fusion_pattern_end_flag = "fusion_pattern_end"
def _initialize(impl_path):
"""Initialize"""
te_set_version(ddk_version)
if impl_path == "":
op_module_name = build_in_impl_path
else:
@ -53,7 +51,7 @@ def build_op(build_type, json_str):
"""
kernel_info = json.loads(json_str)
check_kernel_info(kernel_info)
te_set_version(kernel_info["op_info"]["socVersion"])
op_name = kernel_info['op_info']['name']
try:
@ -111,7 +109,7 @@ def compile_fusion_op(json_str):
Exception: If specific keyword is not found.
"""
args = json.loads(json_str)
te_set_version(ddk_version)
te_set_version(args['fusion_op']["socVersion"])
if 'fusion_op' not in args or not args['fusion_op']:
raise ValueError("Json string Errors, key:fusion_op not found.")
fusion_op_arg = args['fusion_op']

View File

@ -104,7 +104,7 @@ class TbeProcess:
def __init__(self):
self.__processe_num = multiprocessing.cpu_count()
# max_processes_num: Set the maximum number of concurrent processes for compiler
max_processes_num = 16
max_processes_num = 24
if self.__processe_num > max_processes_num:
self.__processe_num = max_processes_num
self.__pool = None

View File

@ -122,15 +122,16 @@ class AscendEnvChecker(EnvChecker):
def __init__(self):
self.version = ["1.75.22.0.220"]
atlas_fwk_version = "/usr/local/Ascend/nnae/latest/fwkacllib/version.info"
atlas_fwk_version = "/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/version.info"
hisi_fwk_version = "/usr/local/Ascend/fwkacllib/version.info"
if os.path.exists(atlas_fwk_version):
# atlas default path
self.fwk_path = "/usr/local/Ascend/nnae/latest/fwkacllib"
self.op_impl_path = "/usr/local/Ascend/nnae/latest/opp/op_impl/built-in/ai_core/tbe"
self.fwk_path = "/usr/local/Ascend/ascend-toolkit/latest/fwkacllib"
self.op_impl_path = "/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe"
self.tbe_path = self.fwk_path + "/lib64"
self.cce_path = self.fwk_path + "/ccec_compiler/bin"
self.fwk_version = atlas_fwk_version
self.op_path = "/usr/local/Ascend/ascend-toolkit/latest/opp"
elif os.path.exists(hisi_fwk_version):
# hisi default path
self.fwk_path = "/usr/local/Ascend/fwkacllib"
@ -138,6 +139,7 @@ class AscendEnvChecker(EnvChecker):
self.tbe_path = self.fwk_path + "/lib64"
self.cce_path = self.fwk_path + "/ccec_compiler/bin"
self.fwk_version = hisi_fwk_version
self.op_path = ""
else:
# custom or unknown environment
self.fwk_path = ""
@ -145,17 +147,20 @@ class AscendEnvChecker(EnvChecker):
self.tbe_path = ""
self.cce_path = ""
self.fwk_version = ""
self.op_path = ""
# env
self.path = os.getenv("PATH")
self.python_path = os.getenv("PYTHONPATH")
self.ld_lib_path = os.getenv("LD_LIBRARY_PATH")
self.ascend_opp_path = os.getenv("ASCEND_OPP_PATH")
# check content
self.path_check = "/fwkacllib/ccec_compiler/bin/"
self.python_path_check = "opp/op_impl/built_in/ai_core/tbe/"
self.ld_lib_path_check_fwk = "/fwkacllib/lib64/"
self.ld_lib_path_check_addons = "/add-ons/"
self.ascend_opp_path_check = "/op"
self.v = ""
def check_env(self, e):
@ -205,6 +210,15 @@ class AscendEnvChecker(EnvChecker):
f"No such directory: {self.cce_path}, Please check if Ascend 910 AI software package is "
"installed correctly.")
if self.op_path is None:
pass
elif Path(self.op_path).is_dir():
os.environ['ASCEND_OPP_PATH'] = self.op_path
else:
raise EnvironmentError(
f"No such directory: {self.op_path}, Please check if Ascend 910 AI software package is "
"installed correctly.")
def _check_env(self):
"""ascend dependence path check"""
if self.path is None or self.path_check not in self.path:
@ -223,6 +237,11 @@ class AscendEnvChecker(EnvChecker):
"LD_LIBRARY_PATH, you can reference to the installation guidelines "
"https://www.mindspore.cn/install")
if self.ascend_opp_path is None or self.ascend_opp_path_check not in self.ascend_opp_path:
logger.warning(
"Can not find opp path (need by mindspore-ascend), please check if you have set env ASCEND_OPP_PATH, "
"you can reference to the installation guidelines https://www.mindspore.cn/install")
def _read_version(self, file_path):
"""get ascend version info"""
with open(file_path, 'r') as f:

View File

@ -258,7 +258,7 @@ else ()
endif ()
set(MINDSPORE_RPATH ${ORIGIN_PATH}/lib)
if (ENABLE_D)
set(MINDSPORE_RPATH ${MINDSPORE_RPATH}:/usr/local/Ascend/nnae/latest/fwkacllib/lib64)
set(MINDSPORE_RPATH ${MINDSPORE_RPATH}:/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/lib64)
set(MINDSPORE_RPATH ${MINDSPORE_RPATH}:/usr/local/Ascend/fwkacllib/lib64)
set(MINDSPORE_RPATH ${MINDSPORE_RPATH}:/usr/local/Ascend/add-ons)
elseif (ENABLE_GPU)

View File

@ -0,0 +1,139 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <string>
#include "backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_grad_cpu_kernel.h"
#include "utils/ms_utils.h"
#include "backend/kernel_compiler/cpu/mkldnn/mkl_kernel_engine.h"
#include "runtime/device/cpu/cpu_device_address.h"
namespace mindspore {
namespace kernel {
void FusedBatchNormGradCPUKernel::InitInputOutputSize(const CNodePtr &kernel_node) {
CPUKernel::InitInputOutputSize(kernel_node);
MS_EXCEPTION_IF_NULL(kernel_node);
size_t type_size = sizeof(float);
std::vector<size_t> shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
// for store diff scale and bias
size_t tensor_size = shape[1] * 2 * type_size; // [2, c] to store scale and bias
workspace_size_list_.emplace_back(tensor_size);
}
void FusedBatchNormGradCPUKernel::InitKernel(const CNodePtr &kernel_node) {
MS_EXCEPTION_IF_NULL(kernel_node);
MS_LOG(INFO) << "start init FusedBatchNormGradCPUKernel " ;
auto node_name = AnfAlgo::GetCNodeName(kernel_node);
/**
* create desc for dy and x
*/
//get dy and x
std::vector<size_t> dy_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
std::vector<size_t> x_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 1);
// the dy and x should be nchw ,so we need check them
if (dy_shape.size() != 4 || x_shape.size() != 4) {
MS_LOG(EXCEPTION) << "Fused batchnorm grad only support nchw input!";
}
// des for x,y,dy,dx
batch_size = x_shape[0];
channel = x_shape[1];
hw_size = x_shape[2] * x_shape[3];
nhw_size = x_shape[0] * hw_size;
dnnl::memory::desc x_desc = GetDefaultMemDesc(x_shape);
// create scale_bias_desc
dnnl::memory::desc scale_desc = GetDefaultMemDesc({1, channel});
auto epsilon = AnfAlgo::GetNodeAttr<float>(kernel_node, "epsilon");
/**
* define forward hint
* */
auto f_prop_kind = dnnl::prop_kind::forward_training; //forward prop kind
auto f_normalization_flags = dnnl::normalization_flags::use_scale_shift; //forward normalization flags
dnnl::batch_normalization_forward::desc fhint_desc =
dnnl::batch_normalization_forward::desc(f_prop_kind, x_desc, epsilon, f_normalization_flags);
auto fhint_prim_desc = dnnl::batch_normalization_forward::primitive_desc(fhint_desc, MKLKernelEngine::Get().engine());
/**
* define backward
*/
auto bprop_kind = dnnl::prop_kind::backward;
auto bnormalization_flags = dnnl::normalization_flags::use_scale_shift;
//create desc for backward
dnnl::batch_normalization_backward::desc b_desc =
dnnl::batch_normalization_backward::desc(bprop_kind, fhint_prim_desc.dst_desc(), x_desc, epsilon, bnormalization_flags);
//create prim_desc for backward
auto bprim_desc = dnnl::batch_normalization_backward::primitive_desc(b_desc, MKLKernelEngine::Get().engine(), fhint_prim_desc);
primitive_ = std::make_shared<dnnl::batch_normalization_backward>(bprim_desc);
/**
* add Argument for backward
*/
AddArgument(DNNL_ARG_SRC,bprim_desc.src_desc());//x
AddArgument(DNNL_ARG_DIFF_DST,bprim_desc.diff_dst_desc());//dy
AddArgument(DNNL_ARG_SCALE_SHIFT,scale_desc); // only need scale
AddArgument(DNNL_ARG_MEAN, bprim_desc.mean_desc()); // mean
AddArgument(DNNL_ARG_VARIANCE, bprim_desc.variance_desc()); // var
AddArgument(DNNL_ARG_DIFF_SRC,bprim_desc.diff_src_desc()); //dx
AddArgument(DNNL_ARG_DIFF_SCALE_SHIFT, bprim_desc.diff_weights_desc()); //diff scale and diff bias
AddArgument(DNNL_ARG_WORKSPACE, bprim_desc.workspace_desc());// workspace
}
bool FusedBatchNormGradCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
const std::vector<kernel::AddressPtr> &workspace,
const std::vector<kernel::AddressPtr> &outputs) {
//input : dy x scale save_mean save_inv_variance 5
MS_LOG(INFO) <<" start FusedBatchNormGradCPUkernel";
if (inputs.size() < 5 || outputs.empty()) {
MS_LOG(EXCEPTION) << "Error input output size!";
}
SetArgumentHandle(DNNL_ARG_SRC,inputs[1]->addr);//x
SetArgumentHandle(DNNL_ARG_DIFF_DST,inputs[0]->addr);//dy
SetArgumentHandle(DNNL_ARG_SCALE_SHIFT,inputs[2]->addr); // only need scale
SetArgumentHandle(DNNL_ARG_MEAN, inputs[3]->addr); // mean
SetArgumentHandle(DNNL_ARG_VARIANCE, inputs[4]->addr); // var
SetArgumentHandle(DNNL_ARG_DIFF_SRC,outputs[0]->addr); //dx
SetArgumentHandle(DNNL_ARG_DIFF_SCALE_SHIFT, workspace[0]->addr); //diff scale and diff bias
ExecutePrimitive();
auto scale = reinterpret_cast<float *>(outputs[1]->addr);
auto bias = reinterpret_cast<float *>(outputs[2]->addr);
auto wksp = reinterpret_cast<float *>(workspace[0]->addr);
auto scale_ret = memcpy_s(scale, outputs[1]->size, workspace[0]->addr, outputs[1]->size);
auto bias_ret = memcpy_s(bias , outputs[2]->size, wksp + (outputs[1]->size / sizeof(float)), outputs[2]->size);
if (scale_ret != 0 || bias_ret !=0) {
MS_LOG(EXCEPTION) << "Memcpy_s error.";
return false;
}
MS_LOG(INFO) << " done batchnorm grad";
return true;
}
} // namespace kernel
} // namespace mindspore

View File

@ -0,0 +1,63 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_GRAD_CPU_KERNEL_H_
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_GRAD_CPU_KERNEL_H_
#include <vector>
#include <memory>
#include "backend/kernel_compiler/cpu/mkldnn/mkl_cpu_kernel.h"
namespace mindspore {
namespace kernel {
class FusedBatchNormGradCPUKernel : public MKLCPUKernel {
public:
FusedBatchNormGradCPUKernel() = default;
~FusedBatchNormGradCPUKernel() override = default;
void InitKernel(const CNodePtr &kernel_node) override;
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
const std::vector<AddressPtr> &outputs) override;
protected:
void InitInputOutputSize(const CNodePtr &kernel_node) override;
private:
size_t batch_size{0};
size_t channel{0};
size_t hw_size{0};
size_t nhw_size{0};
};
// registe fusedBatchnormGrad
MS_REG_CPU_KERNEL(FusedBatchNormGrad,
KernelAttr()
.AddInputAttr(kNumberTypeFloat32)
.AddInputAttr(kNumberTypeFloat32)
.AddInputAttr(kNumberTypeFloat32)
.AddInputAttr(kNumberTypeFloat32)
.AddInputAttr(kNumberTypeFloat32)
.AddOutputAttr(kNumberTypeFloat32)
.AddOutputAttr(kNumberTypeFloat32)
.AddOutputAttr(kNumberTypeFloat32),
FusedBatchNormGradCPUKernel)
} // namespace kernel
} // namespace mindspore
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_GRAD_CPU_KERNEL_H_

View File

@ -0,0 +1,67 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "backend/kernel_compiler/cpu/mkldnn/realdiv_cpu_kernel.h"
#include "backend/kernel_compiler/cpu/mkldnn/mkl_kernel_engine.h"
#include "runtime/device/cpu/cpu_device_address.h"
#include "utils/ms_utils.h"
namespace mindspore {
namespace kernel {
void RealDivCPUKernel::InitKernel(const CNodePtr &kernel_node) {
MS_EXCEPTION_IF_NULL(kernel_node);
MS_LOG(INFO) << " Init Realdiv " ;
std::vector<size_t> src0_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
std::vector<size_t> src1_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 1);
std::vector<size_t> dst_shape = AnfAlgo::GetOutputDeviceShape(kernel_node, 0);
if (src0_shape.size() != src1_shape.size() && src1_shape.size() > 1) {
MS_LOG(EXCEPTION) << "RealDivCPUKernel : RealDiv only support same dim input or tensor / scalar " << src0_shape.size() << " vs "
<< src1_shape.size();
}
if (src1_shape.size() < src0_shape.size()) {
for (size_t i = src1_shape.size(); i < src0_shape.size(); ++i) {
src1_shape.emplace_back(1);
}
}
dnnl::memory::desc src0_desc = GetDefaultMemDesc(src0_shape);
dnnl::memory::desc src1_desc = GetDefaultMemDesc(src1_shape);
dnnl::memory::desc dst_desc = GetDefaultMemDesc(dst_shape);
dnnl::binary::desc desc = dnnl::binary::desc(dnnl::algorithm::binary_div, src0_desc, src1_desc, dst_desc);
auto prim_desc = dnnl::binary::primitive_desc(desc, MKLKernelEngine::Get().engine());
primitive_ = std::make_shared<dnnl::binary>(prim_desc);
AddArgument(DNNL_ARG_SRC_0, src0_desc);
AddArgument(DNNL_ARG_SRC_1, src1_desc);
AddArgument(DNNL_ARG_DST, dst_desc);
}
bool RealDivCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
const std::vector<kernel::AddressPtr> & /*workspace*/,
const std::vector<kernel::AddressPtr> &outputs) {
MS_LOG(INFO) << "RealDivCPUKernel : start Launch Realdiv" ;
if (inputs.size() < 2 || outputs.empty()) {
MS_LOG(EXCEPTION) << "RealDivCPUKernel : Realdiv error input output size!";
}
SetArgumentHandle(DNNL_ARG_SRC_0, inputs[0]->addr);
SetArgumentHandle(DNNL_ARG_SRC_1, inputs[1]->addr);
SetArgumentHandle(DNNL_ARG_DST, outputs[0]->addr);
ExecutePrimitive();
return true;
}
} // namespace kernel
} // namespace mindspore

View File

@ -0,0 +1,42 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_REALDIV_CPU_KERNEL_H_
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_REALDIV_CPU_KERNEL_H_
#include <vector>
#include <memory>
#include "backend/kernel_compiler/cpu/mkldnn/mkl_cpu_kernel.h"
namespace mindspore {
namespace kernel {
class RealDivCPUKernel : public MKLCPUKernel {
public:
RealDivCPUKernel() = default;
~RealDivCPUKernel() override = default;
void InitKernel(const CNodePtr &kernel_node) override;
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
const std::vector<AddressPtr> &outputs) override;
};
MS_REG_CPU_KERNEL(
RealDiv,
KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
RealDivCPUKernel);
} // namespace kernel
} // namespace mindspore
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_REALDIV_CPU_KERNEL_H_

View File

@ -39,6 +39,7 @@ void ReduceCPUKernel::InitKernel(const CNodePtr &kernel_node) {
}
shape_ = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
auto axis_addr = AnfAlgo::GetCNodePrimitive(kernel_node)->GetAttr(AXIS);
if (axis_addr->isa<ValueTuple>()) {
auto attr_axis = AnfAlgo::GetNodeAttr<std::vector<int>>(kernel_node, AXIS);
if (attr_axis.size() > shape_.size()) {
@ -47,18 +48,24 @@ void ReduceCPUKernel::InitKernel(const CNodePtr &kernel_node) {
axis_.push_back(shape_.size() - 1);
} else {
for (auto axis : attr_axis) {
while (axis < 0) {
axis += SizeToInt(shape_.size());
}
if (IntToSize(axis) >= (shape_.size())) {
MS_LOG(EXCEPTION) << "axis value is oversize.";
}
axis < 0 ? axis_.push_back(axis + shape_.size()) : axis_.push_back(axis);
axis_.push_back(IntToSize(axis));
}
}
} else if (axis_addr->isa<Int32Imm>()) {
int axis = AnfAlgo::GetNodeAttr<int>(kernel_node, AXIS);
if (axis >= 0 && IntToSize(axis) >= shape_.size()) {
while (axis < 0) {
axis += SizeToInt(shape_.size());
}
if (IntToSize(axis) >= shape_.size()) {
MS_LOG(EXCEPTION) << "axis value is oversize.";
}
axis < 0 ? axis_.push_back(axis + shape_.size()) : axis_.push_back(axis);
axis_.push_back(IntToSize(axis));
} else {
MS_LOG(EXCEPTION) << "Attribute axis type is invalid.";
}

View File

@ -0,0 +1,173 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "backend/kernel_compiler/cpu/tile_cpu_kernel.h"
#include "runtime/device/cpu/cpu_device_address.h"
#include <iostream>
#include <string>
namespace mindspore {
namespace kernel {
size_t TileCPUKernel::GetOuterSize(std::vector<size_t> shape, int index){
size_t size = 1;
for(int i=0;i<index;i++){
size *= shape.at(i);
}
return size;
}
size_t TileCPUKernel::GetInnerSize(std::vector<size_t> shape, int index){
size_t size = 1;
int shape_size = shape.size();
for(int i=index;i < shape_size;i++){
size *= shape.at(i);
}
return size;
}
size_t TileCPUKernel::GetTypeSize(){
size_t type_size =1;
if (InputTensorType == kNumberTypeInt32){
type_size = sizeof(int);
}
else if(InputTensorType == kNumberTypeInt64){
type_size = sizeof(int64_t);
}
else if(InputTensorType == kNumberTypeInt16){
type_size = sizeof(int16_t);
}
else if(InputTensorType == kNumberTypeFloat32){
type_size = sizeof(float);
}
return type_size;
}
void TileCPUKernel::InitKernel(const CNodePtr &kernel_node) {
MS_LOG(INFO) << " start Init Tile kernel";
MS_EXCEPTION_IF_NULL(kernel_node);
src0_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
multiples = AnfAlgo::GetNodeAttr<std::vector<int>>(kernel_node, "multiples");
dst_shape = AnfAlgo::GetOutputDeviceShape(kernel_node, 0);
if (src0_shape.size() > multiples.size()) {
MS_LOG(EXCEPTION) << "The rank of input is greater than multiples.";
}
// make src0_shape align to multiples
if (src0_shape.size() < multiples.size()) {
for (size_t i = src0_shape.size(); i < multiples.size(); ++i) {
src0_shape.emplace(src0_shape.begin(),1);
}
}
if(src0_shape.size() != multiples.size()){
MS_LOG(EXCEPTION) << "make src0_shape align to multiples failed.";
}
size_t output_tensor_size = 1;
InputTensorType = AnfAlgo::GetInputDeviceDataType(kernel_node, 0);
for(auto each: dst_shape){
output_tensor_size *= each ;
}
output_tensor_size *= GetTypeSize();
workspace_size_list_.emplace_back(output_tensor_size);
}
bool TileCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
const std::vector<kernel::AddressPtr> & workspace,
const std::vector<kernel::AddressPtr> &outputs) {
MS_LOG(INFO) << " start Launch Tile kernel";
if (inputs.empty() || outputs.empty()) {
MS_LOG(EXCEPTION) << "input or output empty!";
}
if (inputs[0]->size > outputs[0]->size) {
return false;
}
if (InputTensorType == kNumberTypeInt32){
LaunchKernel<int>(inputs, workspace ,outputs);
}
else if(InputTensorType == kNumberTypeInt64){
LaunchKernel<int64_t>(inputs,workspace , outputs);
}
else if(InputTensorType == kNumberTypeInt16){
LaunchKernel<int16_t>(inputs,workspace , outputs);
}
else if(InputTensorType == kNumberTypeFloat32){
LaunchKernel<float>(inputs, workspace ,outputs);
}
return true;
}
/**
* Launch kernel base on input type
*/
template <typename T>
void TileCPUKernel::LaunchKernel(const std::vector<kernel::AddressPtr> &inputs,
const std::vector<kernel::AddressPtr> & workspace,
const std::vector<kernel::AddressPtr> &outputs) {
size_t shape_size = src0_shape.size();
size_t item_size = sizeof(T);
auto X_ptr = reinterpret_cast<T *>(inputs[0]->addr);
auto Y_ptr = reinterpret_cast<T *>(outputs[0]->addr);
auto wksp_temp = reinterpret_cast<T *>(workspace[0]->addr); //workspace for save temp result
auto start_of_wksp = reinterpret_cast<T *>(workspace[0]->addr);
auto start_of_Yptr = reinterpret_cast<T *>(outputs[0]->addr);
memcpy_s(Y_ptr , inputs[0]->size, X_ptr, inputs[0]->size);
int data_count=1;
auto temp_dst_shape = src0_shape; //restore temp result shape
for (int axis_t = shape_size -1 ; axis_t >= 0 ; axis_t-- ){
auto temp_dst_size = GetInnerSize(temp_dst_shape,0) * item_size; //get temp dst size
memcpy_s(wksp_temp , temp_dst_size, Y_ptr, temp_dst_size );
int outer_size = GetOuterSize( temp_dst_shape, axis_t);
int inner_size = GetInnerSize(temp_dst_shape, axis_t);
int tile = multiples[axis_t];
int item_count = inner_size;
data_count = item_count * item_size ;
for (int i=0; i < outer_size ; i++){
for ( int t =0 ; t < tile ; t++){
memcpy_s(Y_ptr , data_count, wksp_temp, data_count); // copy to Y
Y_ptr += item_count;
}
wksp_temp += item_count;
}
// reset to start
Y_ptr = start_of_Yptr ;
wksp_temp = start_of_wksp;
temp_dst_shape[axis_t] = dst_shape[axis_t] ; //set temp shape
}
}
} // namespace kernel
} // namespace mindspore

View File

@ -0,0 +1,64 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_TILE_CPU_KERNEL_H_
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_TILE_CPU_KERNEL_H_
#include <vector>
#include <memory>
#include "backend/kernel_compiler/cpu/cpu_kernel.h"
#include "backend/kernel_compiler/cpu/cpu_kernel_factory.h"
namespace mindspore {
namespace kernel {
class TileCPUKernel : public CPUKernel {
public:
TileCPUKernel() = default;
~TileCPUKernel() override = default;
void InitKernel(const CNodePtr &kernel_node) override;
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
const std::vector<AddressPtr> &outputs) override;
private:
// launch kernel base on input type
template <typename T>
void LaunchKernel(const std::vector<kernel::AddressPtr> &inputs,
const std::vector<kernel::AddressPtr> & workspace,
const std::vector<kernel::AddressPtr> &outputs);
size_t GetOuterSize(std::vector<size_t>, int );
size_t GetInnerSize(std::vector<size_t>, int);
TypeId InputTensorType{kNumberTypeInt32};
size_t GetTypeSize();
std::vector<size_t> src0_shape;
std::vector<int> multiples;
std::vector<size_t> dst_shape;
};
MS_REG_CPU_KERNEL(Tile, KernelAttr().AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
TileCPUKernel);
MS_REG_CPU_KERNEL(Tile, KernelAttr().AddInputAttr(kNumberTypeInt32).AddOutputAttr(kNumberTypeInt32),
TileCPUKernel);
MS_REG_CPU_KERNEL(Tile, KernelAttr().AddInputAttr(kNumberTypeInt64).AddOutputAttr(kNumberTypeInt64),
TileCPUKernel);
MS_REG_CPU_KERNEL(Tile, KernelAttr().AddInputAttr(kNumberTypeInt16).AddOutputAttr(kNumberTypeInt16),
TileCPUKernel);
} // namespace kernel
} // namespace mindspore
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_TILE_CPU_KERNEL_H_

View File

@ -0,0 +1,446 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <limits>
#include "ctcloss_impl.cuh"
#include "runtime/device/gpu/cuda_common.h"
template <typename T>
__device__ T LogSumExp(const T logprob1, const T logprob2) {
if (logprob1 == logprob2 && logprob1 == -std::numeric_limits<T>::infinity()) {
return logprob1;
} else {
return (logprob1 > logprob2) ? logprob1 + log1pf(expf(logprob2 - logprob1))
: logprob2 + log1pf(expf(logprob1 - logprob2));
}
}
template <typename T>
__global__ void CalculateFwdVarKernel(T *log_alpha_b, int *label_value_with_blank, T *softmax_probs,
const int *sequence_length, bool ctc_merge_repeated, int batch, int SOffSet,
int maxtime, int blank, int *label_squence_length, int *cum_labels_length,
bool ignore_longer_outputs_than_inputs) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
if (sequence_length[i] == 0 ||
(ignore_longer_outputs_than_inputs && label_squence_length[i] > sequence_length[i])) {
} else {
T *log_alpha_b_cur = &log_alpha_b[i * SOffSet * maxtime];
int *label_value_with_blank_cur = &label_value_with_blank[0];
if (i > 0) {
label_value_with_blank_cur = &label_value_with_blank[2 * cum_labels_length[i - 1] + i];
}
int numclass = blank + 1;
int U = 2 * label_squence_length[i] + 1;
int Ti = sequence_length[i];
int low = 0;
int high = 0;
log_alpha_b_cur[0] = log(softmax_probs[i * numclass + blank]);
int label0 = blank;
if (U > 1) {
label0 = label_value_with_blank_cur[1];
log_alpha_b_cur[maxtime] = log(softmax_probs[i * numclass + label0]);
}
for (int t = 1; t < Ti; ++t) {
low = 0;
high = U;
int low_limit = U - (2 * (Ti - t));
int high_limit = 2 * (t + 1);
if (low_limit > low) {
low = low_limit;
}
if (high_limit < U) {
high = high_limit;
}
for (int u = low; u < high; ++u) {
T sum_log_alpha = -std::numeric_limits<T>::infinity();
if (ctc_merge_repeated || label_value_with_blank_cur[u] == blank) {
sum_log_alpha = log_alpha_b_cur[u * maxtime + t - 1];
}
if (u > 0) {
sum_log_alpha = LogSumExp(sum_log_alpha, log_alpha_b_cur[(u - 1) * maxtime + t - 1]);
}
if (u > 1) {
const bool matching_labels_merge =
ctc_merge_repeated && (label_value_with_blank_cur[u] == label_value_with_blank_cur[u - 2]);
if (label_value_with_blank_cur[u] != blank && !matching_labels_merge) {
sum_log_alpha = LogSumExp(sum_log_alpha, log_alpha_b_cur[(u - 2) * maxtime + t - 1]);
}
}
log_alpha_b_cur[u * maxtime + t] =
log(softmax_probs[i * numclass + label_value_with_blank_cur[u] + t * numclass * batch]) + sum_log_alpha;
}
}
}
}
}
template <typename T>
__global__ void CalculateBwdVarKernel(T *log_beta_b, int *label_value_with_blank, T *softmax_probs,
const int *sequence_length, bool ctc_merge_repeated, int batch, int SOffSet,
int maxtime, int blank, int *label_squence_length, int *cum_labels_length,
bool ignore_longer_outputs_than_inputs) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
if (sequence_length[i] == 0 ||
(ignore_longer_outputs_than_inputs && label_squence_length[i] > sequence_length[i])) {
} else {
T *log_beta_b_cur = &log_beta_b[i * SOffSet * maxtime];
int *label_value_with_blank_cur = &label_value_with_blank[0];
if (i > 0) {
label_value_with_blank_cur = &label_value_with_blank[2 * cum_labels_length[i - 1] + i];
}
int numclass = blank + 1;
int U = 2 * label_squence_length[i] + 1;
int Ti = sequence_length[i];
int low = 0;
int high = 0;
if (U > 1) {
for (int u = U - 2; u < U; ++u) {
log_beta_b_cur[u * maxtime + Ti - 1] = 0;
}
} else {
log_beta_b_cur[Ti - 1] = 0;
log_beta_b_cur[Ti - 2] = 0;
}
for (int t = Ti - 2; t >= 0; --t) {
low = 0;
high = U;
int low_limit = U - (2 * (Ti - t));
int high_limit = 2 * (t + 1);
if (low_limit > low) {
low = low_limit;
}
if (high_limit < U) {
high = high_limit;
}
for (int u = low; u < high; ++u) {
if (ctc_merge_repeated || label_value_with_blank_cur[u] == blank) {
log_beta_b_cur[u * maxtime + t] = LogSumExp(
log_beta_b_cur[u * maxtime + t],
log_beta_b_cur[u * maxtime + t + 1] +
log(softmax_probs[i * numclass + label_value_with_blank_cur[u] + (t + 1) * numclass * batch]));
}
if (u + 1 < U) {
log_beta_b_cur[u * maxtime + t] = LogSumExp(
log_beta_b_cur[u * maxtime + t],
log_beta_b_cur[(u + 1) * maxtime + t + 1] +
log(softmax_probs[i * numclass + label_value_with_blank_cur[u + 1] + (t + 1) * numclass * batch]));
}
if (u + 2 < U) {
const bool matching_labels_merge =
ctc_merge_repeated && (label_value_with_blank_cur[u] == label_value_with_blank_cur[u + 2]);
if (label_value_with_blank_cur[u] != blank && !matching_labels_merge) {
log_beta_b_cur[u * maxtime + t] = LogSumExp(
log_beta_b_cur[u * maxtime + t],
log_beta_b_cur[(u + 2) * maxtime + t + 1] +
log(softmax_probs[i * numclass + label_value_with_blank_cur[u + 2] + (t + 1) * numclass * batch]));
}
}
}
}
}
}
}
template <typename T>
__global__ void ProbInitKernel(T *prob_num, int size) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size; i += blockDim.x * gridDim.x) {
prob_num[i] = -std::numeric_limits<T>::infinity();
}
}
template <typename T>
__global__ void LogBInitKernel(T *log_b, int log_prob_size) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < log_prob_size; i += blockDim.x * gridDim.x) {
log_b[i] = -std::numeric_limits<T>::infinity();
}
}
template <typename T>
__global__ void CTCLossKernel(T *log_alpha_b, T *log_beta_b, T *softmax_probs, int *label_value_with_blank, int batch,
int SOffSet, int maxtime, int numclass, const int *sequence_length,
int *label_squence_length, int *cum_labels_length, T *cost, T *grads, T *prob_num,
bool ignore_longer_outputs_than_inputs) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
if (sequence_length[i] == 0 ||
(ignore_longer_outputs_than_inputs && label_squence_length[i] > sequence_length[i])) {
} else {
T *grad_cur = &grads[i * numclass];
const T *softmax_probs_cur = &softmax_probs[i * numclass];
T *prob_num_cur = &prob_num[i * numclass];
int U = 2 * label_squence_length[i] + 1;
T log_pzx = -std::numeric_limits<T>::infinity();
const T *log_alpha_b_cur = &log_alpha_b[i * SOffSet * maxtime];
const T *log_beta_b_cur = &log_beta_b[i * SOffSet * maxtime];
int *label_value_with_blank_cur = &label_value_with_blank[0];
if (i > 0) {
label_value_with_blank_cur = &label_value_with_blank[2 * cum_labels_length[i - 1] + i];
}
for (int u = 0; u < U; ++u) {
log_pzx = LogSumExp(log_pzx, log_alpha_b_cur[u * maxtime] + log_beta_b_cur[u * maxtime]);
}
cost[i] = -log_pzx;
// grad
int L = numclass;
int Ti = sequence_length[i];
if (log_pzx == -std::numeric_limits<T>::infinity()) {
for (int t = 0; t < Ti; ++t) {
for (int l = 0; l < L; ++l) {
grad_cur[t * numclass * batch + l] = softmax_probs_cur[t * numclass * batch + l];
}
}
} else {
for (int t = 0; t < Ti; ++t) {
for (int u = 0; u < U; ++u) {
int l = label_value_with_blank_cur[u];
prob_num_cur[t * batch * numclass + l] =
LogSumExp(prob_num_cur[t * batch * numclass + l],
log_alpha_b_cur[u * maxtime + t] + log_beta_b_cur[u * maxtime + t]);
}
for (int l = 0; l < L; ++l) {
grad_cur[t * numclass * batch + l] =
softmax_probs_cur[t * numclass * batch + l] - expf(prob_num_cur[t * batch * numclass + l] - log_pzx);
}
}
}
}
}
}
template <typename T>
__global__ void InnerSoftMaxKernel(const T *probs, T *softmax_probs, const int *sequence_length, int max_time,
int batch, int numclass) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch * max_time; i += blockDim.x * gridDim.x) {
int k = i / batch;
int m = i % batch;
if (k < sequence_length[m]) {
T maxCoeff = 0.;
T sumCoeff = 0.;
for (int j = i * numclass; j < (i + 1) * numclass; ++j) {
if (probs[j] > maxCoeff) {
maxCoeff = probs[j];
}
}
for (int j = i * numclass; j < (i + 1) * numclass; ++j) {
sumCoeff += exp(probs[j] - maxCoeff);
softmax_probs[j] = exp(probs[j] - maxCoeff);
}
for (int j = i * numclass; j < (i + 1) * numclass; ++j) {
softmax_probs[j] /= sumCoeff;
}
}
}
}
__global__ void GenLabelValuePCRKernel(int *label_value_sp, int *label_value_pcr, int *label_squence_length,
int *cum_labels_length, int batch) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
int L = label_squence_length[i];
label_squence_length[i] = 0;
int offset = 0;
if (i > 0) {
offset = cum_labels_length[i - 1];
}
for (int l = offset; l < L; ++l) {
if (l == offset || label_value_sp[l] != label_value_sp[l - 1]) {
label_value_pcr[offset + label_squence_length[i]++] = label_value_sp[l];
}
}
}
}
__global__ void UpdateLengthKernel(int *label_squence_length, int *cum_labels_length, int *max_labels_length,
int batch) {
max_labels_length[0] = 0;
for (int i = 0; i < batch; ++i) {
if (label_squence_length[i] > max_labels_length[0]) {
max_labels_length[0] = label_squence_length[i];
}
if (i == 0) {
cum_labels_length[i] = label_squence_length[i];
} else {
cum_labels_length[i] = label_squence_length[i] + cum_labels_length[i - 1];
}
}
}
template <typename T>
void CalculateBwdVar(T *log_beta_b, int *label_value_with_blank, T *softmax_probs, const int *sequence_length,
bool ctc_merge_repeated, int batch, int SOffSet, int maxtime, int blank, int *label_squence_length,
int *cum_labels_length, bool ignore_longer_outputs_than_inputs, cudaStream_t stream) {
int log_prob_size = SOffSet * batch * maxtime;
LogBInitKernel<<<GET_BLOCKS(log_prob_size), GET_THREADS, 0, stream>>>(log_beta_b, log_prob_size);
CalculateBwdVarKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(
log_beta_b, label_value_with_blank, softmax_probs, sequence_length, ctc_merge_repeated, batch, SOffSet, maxtime,
blank, label_squence_length, cum_labels_length, ignore_longer_outputs_than_inputs);
}
template <typename T>
void CalculateFwdVar(T *log_alpha_b, int *label_value_with_blank, T *softmax_probs, const int *sequence_length,
bool ctc_merge_repeated, int batch, int SOffSet, int maxtime, int blank, int *label_squence_length,
int *cum_labels_length, bool ignore_longer_outputs_than_inputs, cudaStream_t stream) {
int log_prob_size = SOffSet * batch * maxtime;
LogBInitKernel<<<GET_BLOCKS(log_prob_size), GET_THREADS, 0, stream>>>(log_alpha_b, log_prob_size);
CalculateFwdVarKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(
log_alpha_b, label_value_with_blank, softmax_probs, sequence_length, ctc_merge_repeated, batch, SOffSet, maxtime,
blank, label_squence_length, cum_labels_length, ignore_longer_outputs_than_inputs);
}
template <typename T>
void InnerSoftMax(const T *probs, T *softmax_probs, const int *sequence_length, int max_time, int batch, int numclass,
cudaStream_t stream) {
InnerSoftMaxKernel<<<GET_BLOCKS(batch * max_time), GET_THREADS, 0, stream>>>(probs, softmax_probs, sequence_length,
max_time, batch, numclass);
}
__global__ void GenLabelWithBlankKernel(int *label_value, int *label_value_with_blank, int *label_squence_length,
int *precum_labels_length, int *cum_labels_length, int batch, int blank) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
int offset = 0;
int offset1 = 0;
if (i > 0) {
offset = 2 * cum_labels_length[i - 1] + i;
offset1 = precum_labels_length[i - 1];
}
for (int j = 0; j < label_squence_length[i]; ++j) {
label_value_with_blank[offset + 2 * j] = blank;
label_value_with_blank[offset + 2 * j + 1] = label_value[offset1 + j];
}
label_value_with_blank[offset + 2 * label_squence_length[i]] = blank;
}
}
void GenLabelWithBlank(int *label_value, int *label_value_with_blank, int *label_squence_length,
int *precum_labels_length, int *cum_labels_length, int batch, int blank, cudaStream_t stream) {
GenLabelWithBlankKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(
label_value, label_value_with_blank, label_squence_length, precum_labels_length, cum_labels_length, batch, blank);
}
void GenLabelValuePCR(int *label_value_sp, int *label_value_pcr, int *label_squence_length, int *cum_labels_length,
int *max_labels_length, int batch, cudaStream_t stream) {
GenLabelValuePCRKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(label_value_sp, label_value_pcr,
label_squence_length, cum_labels_length, batch);
UpdateLengthKernel<<<1, 1, 0, stream>>>(label_squence_length, cum_labels_length, max_labels_length, batch);
}
__global__ void GenLabelValueKernel(int *label_value_sp, const int64_t *label_indices, const int *label_values,
int *label_squence_length, int *cum_labels_length, int size) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size; i += blockDim.x * gridDim.x) {
int64_t b = label_indices[i * 2];
int offset = 0;
if (b > 0) {
offset = cum_labels_length[b - 1];
}
int64_t index = offset + label_indices[i * 2 + 1];
label_value_sp[index] = label_values[i];
}
}
__global__ void LabelValueInitKernel(int *label_value_sp, int size, int blank) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < size; i += blockDim.x * gridDim.x) {
label_value_sp[i] = blank;
}
}
__global__ void RecalculateLengthKernel(int *label_value_sp, int *label_squence_length, int *cum_labels_length,
int batch, int blank) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < batch; i += blockDim.x * gridDim.x) {
int offset = 0;
if (i > 0) {
offset = cum_labels_length[i - 1];
}
int L = label_squence_length[i];
label_squence_length[i] = 0;
for (int j = offset; j < offset + L; ++j) {
if (label_value_sp[j] >= blank) {
break;
} else {
label_squence_length[i]++;
}
}
}
}
void GenLabelValue(int *label_value_sp, const int64_t *label_indices, const int *label_values,
int *label_squence_length, int *cum_labels_length, int *max_labels_length, int size, int blank,
int batch, cudaStream_t stream) {
LabelValueInitKernel<<<GET_BLOCKS(size), GET_THREADS, 0, stream>>>(label_value_sp, size, blank);
GenLabelValueKernel<<<GET_BLOCKS(size), GET_THREADS, 0, stream>>>(label_value_sp, label_indices, label_values,
label_squence_length, cum_labels_length, size);
RecalculateLengthKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(label_value_sp, label_squence_length,
cum_labels_length, batch, blank);
UpdateLengthKernel<<<1, 1, 0, stream>>>(label_squence_length, cum_labels_length, max_labels_length, batch);
}
__global__ void CalculatePreLengthKernel(int *label_squence_length, int *precum_labels_length, int *cum_labels_length,
int *max_labels_length, const int64_t *label_indices, int batch, int size) {
max_labels_length[0] = 0;
for (int i = 0; i < size; ++i) {
label_squence_length[label_indices[i * 2]]++;
if (max_labels_length[0] < label_indices[i * 2]) {
max_labels_length[0] = label_indices[i * 2];
}
}
precum_labels_length[0] = label_squence_length[0];
cum_labels_length[0] = label_squence_length[0];
for (int i = 1; i < batch; ++i) {
cum_labels_length[i] = cum_labels_length[i - 1] + label_squence_length[i];
precum_labels_length[i] = precum_labels_length[i - 1] + label_squence_length[i];
}
}
__global__ void CalculateMaxSequenceKernel(const int *sequence_length, int *max_labels_length, int batch) {
max_labels_length[0] = 0;
for (int i = 0; i < batch; ++i) {
if (sequence_length[i] > max_labels_length[0]) {
max_labels_length[0] = sequence_length[i];
}
}
}
void CalculateMaxSequence(const int *sequence_length, int *max_labels_length, int batch, cudaStream_t stream) {
CalculateMaxSequenceKernel<<<1, 1, 0, stream>>>(sequence_length, max_labels_length, batch);
}
void CalculatePreLength(int *label_squence_length, int *precum_labels_length, int *cum_labels_length,
int *max_labels_length, const int64_t *label_indices, int batch, int size,
cudaStream_t stream) {
CalculatePreLengthKernel<<<1, 1, 0, stream>>>(label_squence_length, precum_labels_length, cum_labels_length,
max_labels_length, label_indices, batch, size);
}
template <typename T>
void CTCLoss(T *log_alpha_b, T *log_beta_b, T *softmax_probs, int *label_value_with_blank, int batch, int SOffSet,
int maxtime, int numclass, const int *sequence_length, int *label_squence_length, int *cum_labels_length,
T *cost, T *grads, T *prob_num, bool ignore_longer_outputs_than_inputs, cudaStream_t stream) {
ProbInitKernel<<<GET_BLOCKS(maxtime * batch * numclass), GET_THREADS, 0, stream>>>(prob_num,
maxtime * batch * numclass);
CTCLossKernel<<<GET_BLOCKS(batch), GET_THREADS, 0, stream>>>(
log_alpha_b, log_beta_b, softmax_probs, label_value_with_blank, batch, SOffSet, maxtime, numclass, sequence_length,
label_squence_length, cum_labels_length, cost, grads, prob_num, ignore_longer_outputs_than_inputs);
}
template void CalculateFwdVar<float>(float *log_alpha_b, int *label_value_with_blank, float *softmax_probs,
const int *sequence_length, bool ctc_merge_repeated, int batch, int SOffSet,
int maxtime, int blank, int *label_squence_length, int *cum_labels_length,
bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
template void CalculateBwdVar<float>(float *log_beta_b, int *label_value_with_blank, float *softmax_probs,
const int *sequence_length, bool ctc_merge_repeated, int batch, int SOffSet,
int maxtime, int blank, int *label_squence_length, int *cum_labels_length,
bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
template void InnerSoftMax<float>(const float *probs, float *softmax_probs, const int *sequence_length, int max_time,
int batch, int numclass, cudaStream_t stream);
template void CTCLoss<float>(float *log_alpha_b, float *log_beta_b, float *softmax_probs, int *label_value_with_blank,
int batch, int SOffSet, int maxtime, int numclass, const int *sequence_length,
int *label_squence_length, int *cum_labels_length, float *cost, float *grads,
float *prob_num, bool ignore_longer_outputs_than_inputs, cudaStream_t stream);

View File

@ -0,0 +1,51 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_IMPL_CUH
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_IMPL_CUH
template <typename T>
void CalculateFwdVar(T *log_alpha_b, int *label_value_with_blank, T *softmax_probs, const int *sequence_length,
bool ctc_merge_repeated, int batch, int SOffSet, int maxtime, int blank, int *label_squence_length,
int *cum_labels_length, bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
template <typename T>
void CalculateBwdVar(T *log_beta_b, int *label_value_with_blank, T *softmax_probs, const int *sequence_length,
bool ctc_merge_repeated, int batch, int SOffSet, int maxtime, int blank, int *label_squence_length,
int *cum_labels_length, bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
template <typename T>
void InnerSoftMax(const T *probs, T *softmax_cost, const int *sequence_length, int max_time, int batch, int numclass,
cudaStream_t stream);
void GenLabelValuePCR(int *label_value_sp, int *label_value_pcr, int *label_squence_length, int *cum_labels_length,
int *max_labels_length, int batch, cudaStream_t stream);
void GenLabelWithBlank(int *label_value, int *label_value_with_blank, int *label_squence_length,
int *precum_labels_length, int *cum_labels_length, int batch, int blank, cudaStream_t stream);
void GenLabelValue(int *label_value_sp, const int64_t *label_indices, const int *label_values,
int *label_squence_length, int *cum_labels_length, int *max_labels_length, int size, int blank,
int batch, cudaStream_t stream);
void CalculatePreLength(int *label_squence_length, int *precum_labels_length, int *cum_labels_length,
int *max_labels_length, const int64_t *label_indices, int batch, int size, cudaStream_t stream);
void CalculateMaxSequence(const int *sequence_length, int *max_labels_length, int batch, cudaStream_t stream);
template <typename T>
void CTCLoss(T *log_alpha_b, T *log_beta_b, T *softmax_probs, int *label_value_with_blank, int batch, int SOffSet,
int maxtime, int numclass, const int *sequence_length, int *label_squence_length, int *cum_labels_length,
T *cost, T *grads, T *prob_num, bool ignore_longer_outputs_than_inputs, cudaStream_t stream);
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_IMPL_CUH

View File

@ -44,8 +44,12 @@ MS_REG_GPU_KERNEL_ONE(Square, KernelAttr().AddInputAttr(kNumberTypeFloat16).AddO
UnaryOpGpuKernel, half)
MS_REG_GPU_KERNEL_ONE(Sqrt, KernelAttr().AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
UnaryOpGpuKernel, float)
MS_REG_GPU_KERNEL_ONE(Sqrt, KernelAttr().AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),
UnaryOpGpuKernel, half)
MS_REG_GPU_KERNEL_ONE(Rsqrt, KernelAttr().AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
UnaryOpGpuKernel, float)
MS_REG_GPU_KERNEL_ONE(Rsqrt, KernelAttr().AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),
UnaryOpGpuKernel, half)
MS_REG_GPU_KERNEL_ONE(Sin, KernelAttr().AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
UnaryOpGpuKernel, float)
MS_REG_GPU_KERNEL_ONE(Sin, KernelAttr().AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),

View File

@ -1,31 +1,31 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "backend/kernel_compiler/gpu/nn/ctcloss_gpu_kernel.h"
namespace mindspore {
namespace kernel {
MS_REG_GPU_KERNEL_ONE(CTCLossV2,
KernelAttr()
.AddInputAttr(kNumberTypeFloat32)
.AddInputAttr(kNumberTypeInt32)
.AddInputAttr(kNumberTypeInt32)
.AddInputAttr(kNumberTypeInt32)
.AddOutputAttr(kNumberTypeFloat32)
.AddOutputAttr(kNumberTypeFloat32),
CtcLossGpuKernel, float)
} // namespace kernel
} // namespace mindspore
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "backend/kernel_compiler/gpu/nn/ctcloss_gpu_kernel.h"
namespace mindspore {
namespace kernel {
MS_REG_GPU_KERNEL_ONE(CTCLoss,
KernelAttr()
.AddInputAttr(kNumberTypeFloat32)
.AddInputAttr(kNumberTypeInt64)
.AddInputAttr(kNumberTypeInt32)
.AddInputAttr(kNumberTypeInt32)
.AddOutputAttr(kNumberTypeFloat32)
.AddOutputAttr(kNumberTypeFloat32),
CtcLossGpuKernel, float)
} // namespace kernel
} // namespace mindspore

View File

@ -1,192 +1,233 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_NN_CTCLOSS_GPU_KERNEL_H_
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_NN_CTCLOSS_GPU_KERNEL_H_
#include <cuda_runtime_api.h>
#include <vector>
#include "backend/kernel_compiler/gpu/gpu_kernel.h"
#include "backend/kernel_compiler/gpu/gpu_kernel_factory.h"
#include "runtime/device/gpu/gpu_memory_allocator.h"
namespace mindspore {
namespace kernel {
template <typename T>
class CtcLossGpuKernel : public GpuKernel {
public:
CtcLossGpuKernel()
: cudnn_handle_(nullptr),
probs_desc_(nullptr),
ctcloss_desc_(nullptr),
label_size_(0),
input_lengths_size_(0),
label_lengths_size_(0) {}
~CtcLossGpuKernel() override { DestroyResource(); }
const std::vector<size_t> &GetInputSizeList() const override { return input_size_list_; }
const std::vector<size_t> &GetOutputSizeList() const override { return output_size_list_; }
const std::vector<size_t> &GetWorkspaceSizeList() const override { return workspace_size_list_; }
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &,
const std::vector<AddressPtr> &outputs, void *stream_ptr) override {
float *probs = GetDeviceAddress<float>(inputs, 0);
float *costs = GetDeviceAddress<float>(outputs, 0);
float *grads = GetDeviceAddress<float>(outputs, 1);
// Copy labels/input_lengths/label_length to host as cudnn7.x.x requires
int *labels_host = nullptr;
int *no_blank_labels_host = nullptr;
void *input_lengths_host = nullptr;
void *label_lengths_host = nullptr;
cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
AllocHostMem(&labels_host, &no_blank_labels_host, &input_lengths_host, &label_lengths_host, inputs);
CopyToHostSync(labels_host, no_blank_labels_host, input_lengths_host, label_lengths_host, inputs, stream);
size_t workspace_size = 0;
CHECK_CUDNN_RET_WITH_EXCEPT(
cudnnGetCTCLossWorkspaceSize(
cudnn_handle_, probs_desc_, probs_desc_, reinterpret_cast<int *>(no_blank_labels_host),
reinterpret_cast<int *>(label_lengths_host), reinterpret_cast<int *>(input_lengths_host),
CUDNN_CTC_LOSS_ALGO_DETERMINISTIC, ctcloss_desc_, &workspace_size),
"cudnnGetCTCLossWorkspaceSize failed.");
void *workspace = device::gpu::GPUMemoryAllocator::GetInstance().AllocTensorMem(workspace_size);
if (workspace == nullptr) {
MS_LOG(EXCEPTION) << "Failed to alloc workspace, size: " << workspace_size;
}
CHECK_CUDNN_RET_WITH_EXCEPT(
cudnnCTCLoss(cudnn_handle_, probs_desc_, probs, reinterpret_cast<int *>(no_blank_labels_host),
reinterpret_cast<int *>(label_lengths_host), reinterpret_cast<int *>(input_lengths_host), costs,
probs_desc_, grads, CUDNN_CTC_LOSS_ALGO_DETERMINISTIC, ctcloss_desc_, workspace, workspace_size),
"cudnnCtcLoss failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
device::gpu::GPUMemoryAllocator::GetInstance().FreeTensorMem(workspace);
FreeHostMem(labels_host, no_blank_labels_host, input_lengths_host, label_lengths_host);
return true;
}
bool Init(const CNodePtr &kernel_node) override {
InitResource();
auto probs_shape = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 0);
if (probs_shape.size() != 3) {
MS_LOG(EXCEPTION) << "probs dims: " << probs_shape.size() << " not support.";
}
probs_dims_[0] = probs_shape[0];
probs_dims_[1] = probs_shape[1];
probs_dims_[2] = probs_shape[2];
auto labels_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 1);
if (labels_dims.size() != 1 && labels_dims.size() != 2) {
MS_LOG(EXCEPTION) << "labels dims: " << labels_dims.size() << " not support.";
}
label_size_ = sizeof(int);
for (auto i : labels_dims) {
label_size_ *= i;
}
auto input_length_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 2);
input_lengths_size_ = input_length_dims[0] * sizeof(int);
auto label_length_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 3);
label_lengths_size_ = label_length_dims[0] * sizeof(int);
CHECK_CUDNN_RET_WITH_EXCEPT(
cudnnSetTensorNdDescriptorEx(probs_desc_, CUDNN_TENSOR_NCHW, CUDNN_DATA_FLOAT, 3, probs_dims_),
"cudnnSetTensorNdDescriptorEx failed.");
CHECK_CUDNN_RET_WITH_EXCEPT(cudnnSetCTCLossDescriptorEx(ctcloss_desc_, CUDNN_DATA_FLOAT,
CUDNN_LOSS_NORMALIZATION_SOFTMAX, CUDNN_PROPAGATE_NAN),
"cudnnSetCTCLossDescriptorEx failed.");
InitSizeLists();
return true;
}
protected:
void InitResource() override {
cudnn_handle_ = device::gpu::GPUDeviceManager::GetInstance().GetCudnnHandle();
CHECK_CUDNN_RET_WITH_EXCEPT(cudnnCreateTensorDescriptor(&probs_desc_), "cudnnCreateTensorDescriptor failed.");
CHECK_CUDNN_RET_WITH_EXCEPT(cudnnCreateCTCLossDescriptor(&ctcloss_desc_), "cudnnCreateCTCLossDescriptor failed.");
}
void InitSizeLists() override {
input_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(float));
input_size_list_.push_back(label_size_);
input_size_list_.push_back(input_lengths_size_);
input_size_list_.push_back(label_lengths_size_);
output_size_list_.push_back(probs_dims_[1] * sizeof(float));
output_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(float));
}
private:
void DestroyResource() noexcept {
CHECK_CUDNN_RET_WITH_ERROR(cudnnDestroyCTCLossDescriptor(ctcloss_desc_), "cudnnDestroyCTCLossDescriptor failed.");
CHECK_CUDNN_RET_WITH_ERROR(cudnnDestroyTensorDescriptor(probs_desc_), "cudnnDestroyTensorDescriptor failed.");
}
void AllocHostMem(int **labels_host, int **no_blank_labels_host, void **input_lengths_host, void **label_lengths_host,
const std::vector<AddressPtr> &inputs) {
CHECK_CUDA_RET_WITH_EXCEPT(cudaMallocHost(labels_host, inputs[1]->size), "cudaMallocHost failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaMallocHost(no_blank_labels_host, inputs[1]->size), "cudaMallocHost failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaMallocHost(input_lengths_host, inputs[2]->size), "cudaMallocHost failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaMallocHost(label_lengths_host, inputs[3]->size), "cudaMallocHost failed.");
}
void FreeHostMem(int *labels_host, int *no_blank_labels_host, void *input_lengths_host, void *label_lengths_host) {
CHECK_CUDA_RET_WITH_EXCEPT(cudaFreeHost(label_lengths_host), "cudaFreeHost failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaFreeHost(input_lengths_host), "cudaFreeHost failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaFreeHost(labels_host), "cudaFreeHost failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaFreeHost(no_blank_labels_host), "cudaFreeHost failed.");
}
void CopyToHostSync(int *labels_host, int *no_blank_labels_host, void *input_lengths_host, void *label_lengths_host,
const std::vector<AddressPtr> &inputs, cudaStream_t stream) {
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
CHECK_CUDA_RET_WITH_EXCEPT(
cudaMemcpyAsync(labels_host, inputs[1]->addr, inputs[1]->size, cudaMemcpyDeviceToHost, stream),
"cudaMemcpyAsync failed.");
CHECK_CUDA_RET_WITH_EXCEPT(
cudaMemcpyAsync(input_lengths_host, inputs[2]->addr, inputs[2]->size, cudaMemcpyDeviceToHost, stream),
"cudaMemcpyAsync failed.");
CHECK_CUDA_RET_WITH_EXCEPT(
cudaMemcpyAsync(label_lengths_host, inputs[3]->addr, inputs[3]->size, cudaMemcpyDeviceToHost, stream),
"cudaMemcpyAsync failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
// remove blank element
size_t j = 0;
for (size_t i = 0; i < inputs[1]->size / sizeof(int); i++) {
if (labels_host[i] != 0) {
no_blank_labels_host[j] = labels_host[i];
j++;
}
}
}
std::vector<size_t> input_size_list_;
std::vector<size_t> output_size_list_;
std::vector<size_t> workspace_size_list_;
cudnnHandle_t cudnn_handle_;
cudnnTensorDescriptor_t probs_desc_;
cudnnCTCLossDescriptor_t ctcloss_desc_;
int probs_dims_[3] = {0};
int label_size_;
int input_lengths_size_;
int label_lengths_size_;
};
} // namespace kernel
} // namespace mindspore
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_NN_CTCLOSS_GPU_KERNEL_H_
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_NN_CTCLOSS_GPU_KERNEL_H_
#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_NN_CTCLOSS_GPU_KERNEL_H_
#include <cuda_runtime_api.h>
#include <vector>
#include <limits>
#include "backend/kernel_compiler/gpu/gpu_kernel.h"
#include "backend/kernel_compiler/gpu/gpu_kernel_factory.h"
#include "runtime/device/gpu/gpu_memory_allocator.h"
#include "backend/kernel_compiler/gpu/cuda_impl/ctcloss_impl.cuh"
namespace mindspore {
namespace kernel {
template <typename T>
class CtcLossGpuKernel : public GpuKernel {
public:
CtcLossGpuKernel()
: label_indice_size_(0),
label_size_(0),
squence_lengths_size_(0),
preprocess_collapse_repeated_(false),
ctc_merge_repeated_(true),
ignore_longer_outputs_than_inputs_(false) {}
~CtcLossGpuKernel() override = default;
const std::vector<size_t> &GetInputSizeList() const override { return input_size_list_; }
const std::vector<size_t> &GetOutputSizeList() const override { return output_size_list_; }
const std::vector<size_t> &GetWorkspaceSizeList() const override { return workspace_size_list_; }
bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
const std::vector<AddressPtr> &outputs, void *stream_ptr) override {
cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
const T *probs = GetDeviceAddress<T>(inputs, 0);
const int64_t *label_indices = GetDeviceAddress<int64_t>(inputs, 1);
const int *label_values = GetDeviceAddress<int>(inputs, 2);
const int *sequence_length = GetDeviceAddress<int>(inputs, 3);
T *costs = GetDeviceAddress<T>(outputs, 0);
T *grads = GetDeviceAddress<T>(outputs, 1);
T *softmax_probs = GetDeviceAddress<T>(workspace, 0);
int *cum_labels_length = GetDeviceAddress<int>(workspace, 1);
int *label_squence_length = GetDeviceAddress<int>(workspace, 2);
int *label_value_sp = GetDeviceAddress<int>(workspace, 3);
int *label_value_pcr = GetDeviceAddress<int>(workspace, 4);
T *prob_num = GetDeviceAddress<T>(workspace, 5);
int *precum_labels_length = GetDeviceAddress<int>(workspace, 6);
int *max_labels_length = GetDeviceAddress<int>(workspace, 7);
int numclass = SizeToInt(probs_dims_[2]);
int batch = SizeToInt(probs_dims_[1]);
int max_time = SizeToInt(probs_dims_[0]);
int max_sequence = 0;
CalculateMaxSequence(sequence_length, max_labels_length, batch, stream);
CHECK_CUDA_RET_WITH_EXCEPT(
cudaMemcpyAsync(&max_sequence, max_labels_length, sizeof(int), cudaMemcpyDeviceToHost, stream),
"cudaMemcpyAsync failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
if (max_time < max_sequence) {
MS_LOG(EXCEPTION) << "max_time should be greater than sequence length.";
}
InnerSoftMax(probs, softmax_probs, sequence_length, max_time, batch, numclass, stream);
MemsetForWS(label_value_pcr, cum_labels_length, label_squence_length, costs, grads, stream);
int max_labels_length_host = 0;
int batch_label = 0;
int *label_value_with_blank = nullptr;
T *log_alpha_b = nullptr;
T *log_beta_b = nullptr;
CalculatePreLength(label_squence_length, precum_labels_length, cum_labels_length, max_labels_length, label_indices,
batch, label_size_ / sizeof(int), stream);
CHECK_CUDA_RET_WITH_EXCEPT(
cudaMemcpyAsync(&batch_label, max_labels_length, sizeof(int), cudaMemcpyDeviceToHost, stream),
"cudaMemcpyAsync failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
if (batch != batch_label + 1) {
MS_LOG(EXCEPTION) << "label batch should be equal to input batch.";
}
GenLabelValue(label_value_sp, label_indices, label_values, label_squence_length, cum_labels_length,
max_labels_length, label_size_ / sizeof(int), numclass - 1, batch, stream);
if (preprocess_collapse_repeated_) {
GenLabelValuePCR(label_value_sp, label_value_pcr, label_squence_length, cum_labels_length, max_labels_length,
batch, stream);
}
CHECK_CUDA_RET_WITH_EXCEPT(
cudaMemcpyAsync(&max_labels_length_host, max_labels_length, sizeof(int), cudaMemcpyDeviceToHost, stream),
"cudaMemcpyAsync failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
int SOffSet = 2 * max_labels_length_host + 1;
int log_prob_size = batch * SOffSet * max_time;
if (!ignore_longer_outputs_than_inputs_ && max_labels_length_host > max_time) {
MS_LOG(EXCEPTION) << "output size is greater than input size.";
}
MemManageForCus(&log_alpha_b, &log_beta_b, &label_value_with_blank, cum_labels_length, log_prob_size, batch,
stream);
if (preprocess_collapse_repeated_) {
GenLabelWithBlank(label_value_pcr, label_value_with_blank, label_squence_length, precum_labels_length,
cum_labels_length, batch, numclass - 1, stream);
} else {
GenLabelWithBlank(label_value_sp, label_value_with_blank, label_squence_length, precum_labels_length,
cum_labels_length, batch, numclass - 1, stream);
}
CalculateFwdVar(log_alpha_b, label_value_with_blank, softmax_probs, sequence_length, ctc_merge_repeated_, batch,
SOffSet, max_time, numclass - 1, label_squence_length, cum_labels_length,
ignore_longer_outputs_than_inputs_, stream);
CalculateBwdVar(log_beta_b, label_value_with_blank, softmax_probs, sequence_length, ctc_merge_repeated_, batch,
SOffSet, max_time, numclass - 1, label_squence_length, cum_labels_length,
ignore_longer_outputs_than_inputs_, stream);
CTCLoss(log_alpha_b, log_beta_b, softmax_probs, label_value_with_blank, batch, SOffSet, max_time, numclass,
sequence_length, label_squence_length, cum_labels_length, costs, grads, prob_num,
ignore_longer_outputs_than_inputs_, stream);
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
FreeMem(label_value_with_blank, log_alpha_b, log_beta_b);
return true;
}
bool Init(const CNodePtr &kernel_node) override {
InitResource();
auto probs_shape = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 0);
if (probs_shape.size() != 3) {
MS_LOG(EXCEPTION) << "probs dims: " << probs_shape.size() << " not support.";
}
probs_dims_[0] = probs_shape[0];
probs_dims_[1] = probs_shape[1];
probs_dims_[2] = probs_shape[2];
auto indice_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 1);
auto labels_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 2);
if (labels_dims.size() != 1) {
MS_LOG(EXCEPTION) << "labels dims: " << labels_dims.size() << " not support.";
}
if (indice_dims.size() != 2) {
MS_LOG(EXCEPTION) << "labels indice dims: " << indice_dims.size() << " not support.";
}
label_size_ = sizeof(int);
for (auto i : labels_dims) {
label_size_ *= i;
}
label_indice_size_ = sizeof(int64_t);
for (auto i : indice_dims) {
label_indice_size_ *= i;
}
auto squence_length_dims = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 3);
squence_lengths_size_ = squence_length_dims[0] * sizeof(int);
preprocess_collapse_repeated_ = GetAttr<bool>(kernel_node, "preprocess_collapse_repeated");
ctc_merge_repeated_ = GetAttr<bool>(kernel_node, "ctc_merge_repeated");
ignore_longer_outputs_than_inputs_ = GetAttr<bool>(kernel_node, "ignore_longer_outputs_than_inputs");
InitSizeLists();
return true;
}
protected:
void InitSizeLists() override {
input_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T));
input_size_list_.push_back(label_indice_size_);
input_size_list_.push_back(label_size_);
input_size_list_.push_back(squence_lengths_size_);
workspace_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T));
workspace_size_list_.push_back(squence_lengths_size_);
workspace_size_list_.push_back(squence_lengths_size_);
workspace_size_list_.push_back(label_size_);
workspace_size_list_.push_back(label_size_);
workspace_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T));
workspace_size_list_.push_back(squence_lengths_size_);
workspace_size_list_.push_back(sizeof(int));
output_size_list_.push_back(probs_dims_[1] * sizeof(T));
output_size_list_.push_back(probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T));
}
void MemsetForWS(int *label_value_pcr, int *cum_labels_length, int *label_squence_length, T *costs, T *grads,
cudaStream_t stream) {
CHECK_CUDA_RET_WITH_EXCEPT(cudaMemsetAsync(label_value_pcr, static_cast<int>(0), label_size_, stream),
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaMemsetAsync(cum_labels_length, static_cast<int>(0), squence_lengths_size_, stream),
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
CHECK_CUDA_RET_WITH_EXCEPT(
cudaMemsetAsync(label_squence_length, static_cast<int>(0), squence_lengths_size_, stream),
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaMemsetAsync(costs, static_cast<T>(0), probs_dims_[1] * sizeof(T), stream),
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
CHECK_CUDA_RET_WITH_EXCEPT(
cudaMemsetAsync(grads, static_cast<T>(0), probs_dims_[0] * probs_dims_[1] * probs_dims_[2] * sizeof(T), stream),
"cudaMemSet failed in CtcLossGpuKernel::Launch.");
}
void MemManageForCus(T **log_alpha_b, T **log_beta_b, int **label_value_with_blank, int *cum_labels_length,
int log_prob_size, int batch, cudaStream_t stream) {
int total_labels_size_host = 0;
CHECK_CUDA_RET_WITH_EXCEPT(cudaMalloc(reinterpret_cast<void **>(log_alpha_b), sizeof(T) * log_prob_size),
"cudaMalloc failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaMalloc(reinterpret_cast<void **>(log_beta_b), sizeof(T) * log_prob_size),
"cudaMalloc failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaMemcpyAsync(&total_labels_size_host, cum_labels_length + batch - 1, sizeof(int),
cudaMemcpyDeviceToHost, stream),
"cudaMemcpyAsync failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaStreamSynchronize(stream), "cudaStreamSynchronize failed.");
CHECK_CUDA_RET_WITH_EXCEPT(
cudaMalloc(reinterpret_cast<void **>(label_value_with_blank), sizeof(int) * (2 * total_labels_size_host + batch)),
"cudaMalloc failed.");
}
void FreeMem(int *label_value_with_blank, T *log_alpha_b, T *log_beta_b) {
CHECK_CUDA_RET_WITH_EXCEPT(cudaFree(label_value_with_blank), "cudaFree failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaFree(log_alpha_b), "cudaFree failed.");
CHECK_CUDA_RET_WITH_EXCEPT(cudaFree(log_beta_b), "cudaFree failed.");
}
std::vector<size_t> input_size_list_;
std::vector<size_t> output_size_list_;
std::vector<size_t> workspace_size_list_;
size_t probs_dims_[3] = {0};
int label_indice_size_;
int label_size_;
int squence_lengths_size_;
bool preprocess_collapse_repeated_;
bool ctc_merge_repeated_;
bool ignore_longer_outputs_than_inputs_;
T kLogZero_ = -std::numeric_limits<T>::infinity();
}; // namespace kernel
} // namespace kernel
} // namespace mindspore
#endif // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_GPU_NN_CTCLOSS_GPU_KERNEL_H_

View File

@ -66,7 +66,6 @@ class MirrorPadGpuFwdKernel : public GpuKernel {
}
string mode = GetValue<string>(AnfAlgo::GetCNodePrimitive(kernel_node)->GetAttr("mode"));
if (mode == "REFLECT") {
mode_ = 0; // reflected mirroring
} else {

View File

@ -66,7 +66,6 @@ class MirrorPadGpuBackKernel : public GpuKernel {
}
string mode = GetValue<string>(AnfAlgo::GetCNodePrimitive(kernel_node)->GetAttr("mode"));
if (mode == "REFLECT") {
mode_ = 0; // reflected mirroring
} else {

View File

@ -27,6 +27,5 @@ MS_REG_GPU_KERNEL_ONE(
ROIAlign,
KernelAttr().AddInputAttr(kNumberTypeFloat16).AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),
ROIAlignGpuFwdKernel, half)
} // namespace kernel
} // namespace mindspore

View File

@ -27,6 +27,5 @@ MS_REG_GPU_KERNEL_ONE(
ROIAlignGrad,
KernelAttr().AddInputAttr(kNumberTypeFloat16).AddInputAttr(kNumberTypeFloat16).AddOutputAttr(kNumberTypeFloat16),
ROIAlignGradGpuFwdKernel, half)
} // namespace kernel
} // namespace mindspore

View File

@ -87,6 +87,7 @@ static std::map<string, string> tbe_func_adapter_map = {
{"apply_centered_rms_prop", "apply_centered_rms_prop_d"},
{"transpose", "transpose_d"},
{"fill", "fill_d"},
{"i_fmr", "ifmr"},
{"unsorted_segment_sum", "unsorted_segment_sum_d"},
{"unsorted_segment_prod", "unsorted_segment_prod_d"},
{"concat", "concat_d"},
@ -139,7 +140,8 @@ static std::map<string, string> tbe_func_adapter_map = {
{"inplace_update", "inplace_update_d"},
{"matrix_diag", "matrix_diag_d"},
{"matrix_diag_part", "matrix_diag_part_d"},
{"matrix_set_diag", "matrix_set_diag_d"}};
{"matrix_set_diag", "matrix_set_diag_d"},
{"l_stm_input_grad", "lstm_input_grad"}};
void TbeAdapter::NormalizeFuncName(std::string *func_name) {
if (func_name == nullptr) {

View File

@ -25,6 +25,7 @@
#include "backend/kernel_compiler/tbe/tbe_convert_utils.h"
#include "backend/kernel_compiler/tbe/tbe_utils.h"
#include "utils/ms_context.h"
#include "runtime/dev.h"
namespace mindspore {
namespace kernel {
@ -86,6 +87,8 @@ constexpr auto kJPyModulePath = "py_module_path";
constexpr auto kJPreBuildOutsAttrs = "prebuild_outs_attrs";
constexpr auto kJKwdArgs = "kwds_args";
constexpr auto kJListArgs = "list_args";
constexpr auto kJSocVersion = "socVersion";
constexpr auto kSOC_VERSION = "SOC_VERSION";
bool TbeKernelJsonCreator::GenTbeSingleKernelJson(const std::shared_ptr<mindspore::AnfNode> &anf_node,
nlohmann::json *kernel_json) {
@ -122,6 +125,8 @@ bool TbeKernelJsonCreator::GenTbeSingleKernelJson(const std::shared_ptr<mindspor
nlohmann::json attrs_json;
(void)GenTbeAttrJson(anf_node, op_info_ptr, &attrs_json);
op_info_json[kJAttrs] = attrs_json;
auto soc_version = TbeKernelJsonCreator::GetSocVersion();
op_info_json[kJSocVersion] = soc_version;
std::string json_str = op_info_json.dump();
size_t hash_id = std::hash<std::string>()(json_str);
auto context_ptr = MsContext::GetInstance();
@ -150,7 +155,13 @@ bool TbeKernelJsonCreator::GenInputDescJson(const std::shared_ptr<AnfNode> &anf_
MS_EXCEPTION_IF_NULL(input_ptr);
MS_EXCEPTION_IF_NULL(input_list);
std::string op_name = AnfAlgo::GetCNodeName(anf_node);
if (input_ptr->name() == "input_indices" && op_name == kTopKOpName) {
if (op_name == kDynamicRNNOpName && input_ptr->name() == "seq_length") {
nlohmann::json input_desc_json;
auto in_name = input_ptr->name();
input_desc_json[kJName] = in_name + std::to_string(input_i);
input_desc_json[kJValid] = false;
input_list->emplace_back(input_desc_json);
} else if (input_ptr->name() == "input_indices" && op_name == kTopKOpName) {
TbeAdapter::GenTopKV2IndicesTensorInfo(anf_node, real_input_index, input_list, creater_type_);
} else {
auto dtype = GetDeviceInputType(anf_node, real_input_index);
@ -408,6 +419,30 @@ bool TbeKernelJsonCreator::GenTbeAttrJson(const std::shared_ptr<AnfNode> &anf_no
return true;
}
string TbeKernelJsonCreator::GetSocVersion() {
// Get default soc version.
const int kSocVersionLen = 50;
char soc_version[kSocVersionLen] = {0};
auto ret = rtGetSocVersion(soc_version, kSocVersionLen);
if (ret != RT_ERROR_NONE) {
MS_LOG(EXCEPTION) << "GetSocVersion failed.";
}
MS_LOG(INFO) << "Default SocVersion is " << soc_version;
// Get soc version from env value.
const char *soc_version_env = getenv(kSOC_VERSION);
if (soc_version_env != nullptr) {
if (std::strcmp(soc_version, soc_version_env) != 0) {
MS_LOG(WARNING) << "SocVerison change to " << soc_version_env;
ret = rtSetSocVersion(soc_version_env);
if (ret != RT_ERROR_NONE) {
MS_LOG(EXCEPTION) << "SetSocVersion to " << soc_version_env << " failed, errorno: " << ret;
}
return soc_version_env;
}
}
return soc_version;
}
void TbeKernelJsonCreator::ParseAttrValue(const std::string &type, const mindspore::ValuePtr &value,
nlohmann::json *attr_obj) {
MS_EXCEPTION_IF_NULL(value);
@ -624,6 +659,8 @@ bool TbeKernelBuild::GenFusionScopeJson(const std::vector<mindspore::AnfNodePtr>
index = 0;
data_list.insert(data_list.end(), compute_list.begin(), compute_list.end());
(*fusion_json)[kFusionOpList] = data_list;
auto soc_version = TbeKernelJsonCreator::GetSocVersion();
(*fusion_json)[kJSocVersion] = soc_version;
return true;
}
@ -853,6 +890,7 @@ bool TbeKernelBuild::GenFusionDataInputJson(const std::shared_ptr<mindspore::Anf
(*data_str)[kJName] = name;
nlohmann::json output_desc;
output_desc[kJName] = name;
output_desc[kJDataType] = 0;
output_desc[kJShape] = "NULL";
output_desc_list.push_back(output_desc);
(*index)++;
@ -985,6 +1023,7 @@ bool TbeKernelBuild::GenFusionComputeInputJson(const mindspore::CNodePtr &cnode,
for (size_t i = 0; i < optional_num; ++i) {
nlohmann::json optional_input_desc;
optional_input_desc[kJName] = std::string(kOptional) + std::to_string(*index);
optional_input_desc[kJShape] = "NULL";
(*index)++;
(*layer_iter)->emplace_back(nullptr);
input_desc_list_tmp.emplace_back(optional_input_desc);

View File

@ -92,6 +92,7 @@ class TbeKernelJsonCreator {
std::string json_name() { return json_name_; }
bool GenTbeAttrJson(const std::shared_ptr<AnfNode> &anf_node, const std::shared_ptr<OpInfo> &op_info,
nlohmann::json *attrs_json);
static string GetSocVersion();
private:
bool GenTbeInputsJson(const std::shared_ptr<AnfNode> &anf_node, const std::shared_ptr<OpInfo> &op_info,

View File

@ -19,6 +19,7 @@
#include <memory>
#include <string>
#include "backend/optimizer/common/optimizer.h"
#include "backend/optimizer/ascend/ir_fission/dynamic_rnn_grad_fission.h"
#include "backend/optimizer/ascend/ir_fission/bn_split.h"
#include "backend/optimizer/ascend/ir_fission/bn_grad_split.h"
#include "backend/optimizer/ascend/ir_fission/batch_norm_grad_split.h"
@ -64,6 +65,7 @@
#include "backend/optimizer/ascend/format_type/rectify_do_mask_kernel_info.h"
#include "backend/optimizer/ascend/format_type/chang_axis_of_reduce_kernel.h"
#include "backend/optimizer/ascend/format_type/split_unsupported_transdata.h"
#include "backend/optimizer/ascend/format_type/insert_reshape_for_extract_image_patches_op.h"
#include "backend/optimizer/pass/getitem_tuple.h"
#include "backend/optimizer/pass/optimize_dependence.h"
#include "backend/optimizer/pass/erase_visit_attr.h"
@ -106,6 +108,7 @@
#include "backend/optimizer/ascend/ir_fission/concat_fission.h"
#include "backend/optimizer/ascend/ir_fission/pack_fission.h"
#include "backend/optimizer/ascend/enhancer/concat_outputs_for_all_gather.h"
#include "backend/optimizer/ascend/enhancer/add_placeholder_for_dynamic_rnn.h"
#include "utils/ms_context.h"
#include "backend/optimizer/graph_kernel/composite_ops_fusion.h"
#include "backend/optimizer/graph_kernel/basic_ops_fusion.h"
@ -231,6 +234,7 @@ void AscendMixPrecision(const std::shared_ptr<session::KernelGraph> &kernel_grap
auto optimizer = std::make_shared<GraphOptimizer>();
auto mixed_precision_pm = std::make_shared<PassManager>("cast_pm");
mixed_precision_pm->AddPass(std::make_shared<InsertCast>());
mixed_precision_pm->AddPass(std::make_shared<InsertReshapeForExtractImagePatchesOp>());
mixed_precision_pm->AddPass(std::make_shared<GetitemTuple>());
mixed_precision_pm->AddPass(std::make_shared<CommonSubexpressionElimination>());
mixed_precision_pm->AddPass(std::make_shared<EliminateRedundantOp>());
@ -276,6 +280,8 @@ void AscendBackendIRFusionOptimization(const std::shared_ptr<session::KernelGrap
}
ir_fusion_pm->AddPass(std::make_shared<LayerNormGradSplit>());
ir_fusion_pm->AddPass(std::make_shared<InsertPadForNMSWithMask>());
ir_fusion_pm->AddPass(std::make_shared<InsertPlaceholderForDynamicRNN>());
ir_fusion_pm->AddPass(std::make_shared<DynamicRNNGradFission>());
AddAscendIRFusionRulesPass(ir_fusion_pm.get());
AddAscendIRFusionPass(ir_fusion_pm.get());

View File

@ -0,0 +1,77 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "backend/optimizer/ascend/enhancer/add_placeholder_for_dynamic_rnn.h"
#include <vector>
#include <memory>
#include "backend/optimizer/common/helper.h"
#include "backend/session/anf_runtime_algorithm.h"
#include "utils/utils.h"
#include "abstract/abstract_value.h"
#include "base/core_ops.h"
namespace mindspore {
namespace opt {
const BaseRef InsertPlaceholderForDynamicRNN::DefinePattern() const {
std::shared_ptr<Var> V = std::make_shared<CondVar>(UnVisited);
std::shared_ptr<Var> Xs = std::make_shared<SeqVar>();
return VectorRef({V, Xs});
}
const AnfNodePtr InsertPlaceholderForDynamicRNN::Process(const FuncGraphPtr &func_graph, const AnfNodePtr &node,
const EquivPtr &) const {
MS_EXCEPTION_IF_NULL(func_graph);
MS_EXCEPTION_IF_NULL(node);
auto cnode = node->cast<CNodePtr>();
MS_EXCEPTION_IF_NULL(cnode);
auto op_name = AnfAlgo::GetCNodeName(cnode);
if (op_name != kDynamicRNNOpName) {
return nullptr;
}
AnfAlgo::SetNodeAttr(kAttrVisited, MakeValue(true), node);
auto kernel_graph = func_graph->cast<std::shared_ptr<session::KernelGraph>>();
MS_EXCEPTION_IF_NULL(kernel_graph);
size_t input_num = AnfAlgo::GetInputTensorNum(node);
if (input_num == 0) {
return nullptr;
}
std::vector<AnfNodePtr> new_inputs = {AnfAlgo::GetCNodePrimitiveNode(cnode)};
for (size_t in_idx = 0; in_idx < input_num; in_idx++) {
auto input_node = AnfAlgo::GetInputNode(cnode, in_idx);
if (in_idx == 3) {
auto value = std::make_shared<None>();
auto value_node = NewValueNode(value);
value_node->set_abstract(std::make_shared<abstract::AbstractNone>());
auto new_node = kernel_graph->NewValueNode(value_node);
kernel_graph->AddValueNodeToGraph(new_node);
new_inputs.push_back(new_node);
}
new_inputs.push_back(input_node);
}
CNodePtr new_node = nullptr;
if (kernel_graph == nullptr) {
new_node = std::make_shared<CNode>(*cnode);
} else {
new_node = kernel_graph->NewCNode(cnode);
}
MS_EXCEPTION_IF_NULL(new_node);
new_node->set_inputs(new_inputs);
return new_node;
}
} // namespace opt
} // namespace mindspore

View File

@ -0,0 +1,37 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_PRE_ACTIVATE_ASCEND_ENHANCER_ADD_PLACEHOLDER_FOR_DYNAMIC_RNN_H
#define MINDSPORE_CCSRC_PRE_ACTIVATE_ASCEND_ENHANCER_ADD_PLACEHOLDER_FOR_DYNAMIC_RNN_H
#include <memory>
#include <vector>
#include "backend/optimizer/common/optimizer.h"
#include "backend/optimizer/ascend/ascend_helper.h"
namespace mindspore {
namespace opt {
class InsertPlaceholderForDynamicRNN : public PatternProcessPass {
public:
explicit InsertPlaceholderForDynamicRNN(bool multigraph = true)
: PatternProcessPass("add_placeholder_for_dynamic_rnn", multigraph) {}
~InsertPlaceholderForDynamicRNN() override = default;
const BaseRef DefinePattern() const override;
const AnfNodePtr Process(const FuncGraphPtr &, const AnfNodePtr &, const EquivPtr &) const override;
};
} // namespace opt
} // namespace mindspore
#endif // MINDSPORE_CCSRC_PRE_ACTIVATE_ASCEND_ENHANCER_ADD_PLACEHOLDER_FOR_DYNAMIC_RNN_H

View File

@ -0,0 +1,65 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "backend/optimizer/ascend/format_type/insert_reshape_for_extract_image_patches_op.h"
#include <memory>
#include "backend/optimizer/ascend/ascend_helper.h"
#include "backend/session/anf_runtime_algorithm.h"
#include "utils/utils.h"
#include "base/core_ops.h"
namespace mindspore {
namespace opt {
const BaseRef InsertReshapeForExtractImagePatchesOp::DefinePattern() const {
VarPtr Xs = std::make_shared<SeqVar>();
return VectorRef({prim::kPrimExtractImagePatches, Xs});
}
const AnfNodePtr InsertReshapeForExtractImagePatchesOp::Process(const FuncGraphPtr &func_graph, const AnfNodePtr &node,
const EquivPtr &equiv) const {
MS_EXCEPTION_IF_NULL(func_graph);
MS_EXCEPTION_IF_NULL(equiv);
auto extract = CheckAnfNodeIfCNodeAndInputSize(node, 2);
MS_EXCEPTION_IF_NULL(extract);
auto in_node = extract->input(1);
MS_EXCEPTION_IF_NULL(in_node);
auto extract_kernel_build_info = AnfAlgo::GetSelectKernelBuildInfo(extract);
auto in_node_kernel_build_info = AnfAlgo::GetSelectKernelBuildInfo(in_node);
MS_EXCEPTION_IF_NULL(extract_kernel_build_info);
MS_EXCEPTION_IF_NULL(in_node_kernel_build_info);
std::vector<AnfNodePtr> reshape_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimReshape->name())),
in_node};
auto reshape_builder = std::make_shared<kernel::KernelBuildInfo::KernelBuildInfoBuilder>();
reshape_builder->SetInputsFormat({kOpFormat_NC1HWC0});
reshape_builder->SetOutputsFormat({kOpFormat_NC1HWC0});
reshape_builder->SetInputsDeviceType({AnfAlgo::GetOutputDeviceDataType(in_node, 0)});
reshape_builder->SetOutputsDeviceType({AnfAlgo::GetOutputDeviceDataType(in_node, 0)});
reshape_builder->SetKernelType(in_node_kernel_build_info->kernel_type());
reshape_builder->SetFusionType(in_node_kernel_build_info->fusion_type());
reshape_builder->SetProcessor(in_node_kernel_build_info->processor());
auto reshape = func_graph->NewCNode(reshape_inputs);
reshape->set_scope(in_node->scope());
auto shape_tmp = AnfAlgo::GetOutputInferShape(in_node, 0);
AnfAlgo::SetOutputInferTypeAndShape({AnfAlgo::GetOutputDeviceDataType(in_node, 0)},
{{shape_tmp[0], shape_tmp[2], shape_tmp[3], shape_tmp[1]}}, reshape.get());
AnfAlgo::SetSelectKernelBuildInfo(reshape_builder->Build(), reshape.get());
AnfAlgo::SetNodeAttr("nop_op", MakeValue(true), reshape);
AnfAlgo::SetNodeInput(extract, reshape, 0);
return extract;
}
} // namespace opt
} // namespace mindspore

View File

@ -0,0 +1,41 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_FORMAT_TYPE_INSERT_RESHAPE_FOR_EXTRACT_IMAGE_PATCHES_OP_H
#define MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_FORMAT_TYPE_INSERT_RESHAPE_FOR_EXTRACT_IMAGE_PATCHES_OP_H
#include <vector>
#include <string>
#include <utility>
#include <memory>
#include "ir/anf.h"
#include "backend/optimizer/common/pattern_engine.h"
#include "backend/optimizer/common/helper.h"
#include "backend/optimizer/common/optimizer.h"
namespace mindspore {
namespace opt {
class InsertReshapeForExtractImagePatchesOp : public PatternProcessPass {
public:
explicit InsertReshapeForExtractImagePatchesOp(bool multigraph = true)
: PatternProcessPass("insert_reshape_for_extract_image_patches_op", multigraph) {}
~InsertReshapeForExtractImagePatchesOp() override = default;
const BaseRef DefinePattern() const override;
const AnfNodePtr Process(const FuncGraphPtr &, const AnfNodePtr &, const EquivPtr &) const override;
};
} // namespace opt
} // namespace mindspore
#endif // MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_FORMAT_TYPE_INSERT_RESHAPE_FOR_EXTRACT_IMAGE_PATCHES_OP_H

View File

@ -0,0 +1,250 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "backend/optimizer/ascend/ir_fission/dynamic_rnn_grad_fission.h"
#include <vector>
#include <memory>
#include <algorithm>
#include "backend/session/anf_runtime_algorithm.h"
#include "backend/optimizer/common/helper.h"
namespace mindspore {
namespace opt {
constexpr size_t kDynamicRNNGradInputNum = 16;
constexpr size_t kLSTMInputGradOutputNum = 4;
const BaseRef DynamicRNNGradFission::DefinePattern() const {
VarPtr Xs = std::make_shared<SeqVar>();
return VectorRef({prim::kPrimDynamicRNNGrad, Xs});
}
AnfNodePtr CreateSplitVD(const FuncGraphPtr &graph, const AnfNodePtr &node) {
MS_EXCEPTION_IF_NULL(graph);
MS_EXCEPTION_IF_NULL(node);
// SplitV
std::vector<AnfNodePtr> splitvd_input = {NewValueNode(std::make_shared<Primitive>(prim::kPrimSplitV->name())), node};
auto split_vd = graph->NewCNode(splitvd_input);
MS_EXCEPTION_IF_NULL(split_vd);
auto dtypes = {AnfAlgo::GetOutputInferDataType(node, 0), AnfAlgo::GetOutputInferDataType(node, 0)};
std::vector<size_t> shape = {AnfAlgo::GetOutputInferShape(node, 0)[0] - 1, AnfAlgo::GetOutputInferShape(node, 0)[1],
AnfAlgo::GetOutputInferShape(node, 0)[2]};
auto shape2 = {IntToSize(1), AnfAlgo::GetOutputInferShape(node, 0)[1], AnfAlgo::GetOutputInferShape(node, 0)[2]};
std::vector<std::vector<size_t>> shapes = {shape, shape2};
AnfAlgo::SetOutputInferTypeAndShape(dtypes, shapes, split_vd.get());
AnfAlgo::SetNodeAttr("split_dim", MakeValue(0), split_vd);
AnfAlgo::SetNodeAttr("num_split", MakeValue(2), split_vd);
int tmp = SizeToInt(AnfAlgo::GetOutputInferShape(node, 0)[0]) - 1;
AnfAlgo::SetNodeAttr("size_splits", MakeValue(std::vector<int>{tmp, 1}), split_vd);
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), split_vd);
return split_vd;
}
AnfNodePtr CreateLSTMInputGrad(const FuncGraphPtr &graph, const AnfNodePtr &node) {
MS_EXCEPTION_IF_NULL(graph);
MS_EXCEPTION_IF_NULL(node);
auto cnode = node->cast<CNodePtr>();
MS_EXCEPTION_IF_NULL(cnode);
const auto &dynamic_rnn_grad_inputs = cnode->inputs();
std::vector<AnfNodePtr> lstm_input_grad_inputs = {NewValueNode(std::make_shared<Primitive>(kLSTMInputGradOpName)),
dynamic_rnn_grad_inputs[2],
dynamic_rnn_grad_inputs[6],
dynamic_rnn_grad_inputs[8],
dynamic_rnn_grad_inputs[9],
dynamic_rnn_grad_inputs[10],
dynamic_rnn_grad_inputs[11],
dynamic_rnn_grad_inputs[12],
dynamic_rnn_grad_inputs[13],
dynamic_rnn_grad_inputs[14],
dynamic_rnn_grad_inputs[15],
dynamic_rnn_grad_inputs[16]};
std::vector<AnfNodePtr> ori_outputs;
CreateMultipleOutputsOfAnfNode(graph, node, 5, &ori_outputs);
auto lstm_op = graph->NewCNode(lstm_input_grad_inputs);
MS_EXCEPTION_IF_NULL(lstm_op);
auto ori_type = AnfAlgo::GetOutputInferDataType(dynamic_rnn_grad_inputs[8], 0);
auto types = {AnfAlgo::GetOutputInferDataType(ori_outputs[2], 0), AnfAlgo::GetOutputInferDataType(ori_outputs[3], 0),
AnfAlgo::GetOutputInferDataType(ori_outputs[4], 0), ori_type};
std::vector<size_t> ori_shape = {AnfAlgo::GetOutputInferShape(dynamic_rnn_grad_inputs[8], 0)[0],
AnfAlgo::GetOutputInferShape(dynamic_rnn_grad_inputs[8], 0)[1],
4 * AnfAlgo::GetOutputInferShape(dynamic_rnn_grad_inputs[8], 0)[2]};
auto shapes = {AnfAlgo::GetOutputInferShape(ori_outputs[2], 0), AnfAlgo::GetOutputInferShape(ori_outputs[3], 0),
AnfAlgo::GetOutputInferShape(ori_outputs[4], 0), ori_shape};
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, lstm_op.get());
return lstm_op;
}
AnfNodePtr CreateBatchMatMul(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
MS_EXCEPTION_IF_NULL(graph);
MS_EXCEPTION_IF_NULL(node1);
MS_EXCEPTION_IF_NULL(node2);
// BatchMatMul
std::vector<AnfNodePtr> matmul_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimBatchMatMul->name())),
node2, node1};
auto batch_matmul = graph->NewCNode(matmul_inputs);
MS_EXCEPTION_IF_NULL(batch_matmul);
auto types = {AnfAlgo::GetOutputInferDataType(node1, 0)};
std::vector<size_t> shape = {AnfAlgo::GetOutputInferShape(node2, 0)[0], AnfAlgo::GetOutputInferShape(node2, 0)[2],
AnfAlgo::GetOutputInferShape(node1, 0)[2]};
auto shapes = {shape};
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), batch_matmul);
AnfAlgo::SetNodeAttr("transpose_x1", MakeValue(true), batch_matmul);
AnfAlgo::SetNodeAttr("transpose_x2", MakeValue(false), batch_matmul);
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, batch_matmul.get());
return batch_matmul;
}
AnfNodePtr AddHConcatD(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
MS_EXCEPTION_IF_NULL(graph);
MS_EXCEPTION_IF_NULL(node1);
MS_EXCEPTION_IF_NULL(node2);
std::vector<AnfNodePtr> ori_outputs;
CreateMultipleOutputsOfAnfNode(graph, node2, 2, &ori_outputs);
auto ori_shape = AnfAlgo::GetOutputInferShape(node1, 0);
std::vector<std::vector<size_t>> shape_tmp;
if (ori_shape.size() == 3) {
shape_tmp = {ori_shape};
} else {
shape_tmp = {{IntToSize(1), ori_shape[0], ori_shape[1]}};
}
auto ori_dtype = {AnfAlgo::GetOutputInferDataType(node1, 0)};
// reshape
std::vector<AnfNodePtr> reshape_input = {NewValueNode(std::make_shared<Primitive>(prim::kPrimReshape->name())),
node1};
auto reshape = graph->NewCNode(reshape_input);
AnfAlgo::SetNodeAttr(kAttrVisited, MakeValue(true), reshape);
AnfAlgo::SetOutputInferTypeAndShape(ori_dtype, shape_tmp, reshape.get());
// concatd --> concat
std::vector<AnfNodePtr> concat_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimConcat->name())),
reshape, ori_outputs[0]};
auto concat_op = graph->NewCNode(concat_inputs);
MS_EXCEPTION_IF_NULL(concat_op);
std::vector<size_t> input = {AnfAlgo::GetOutputInferShape(node2, 0)[0] + 1, AnfAlgo::GetOutputInferShape(node2, 0)[1],
AnfAlgo::GetOutputInferShape(node2, 0)[2]};
auto types = {AnfAlgo::GetOutputInferDataType(node1, 0)};
auto shapes = {input};
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, concat_op.get());
AnfAlgo::SetNodeAttr(kAttrN, MakeValue(2), concat_op);
AnfAlgo::SetNodeAttr(kAttrDynInputSizes, MakeValue(std::vector<int>{2}), concat_op);
AnfAlgo::SetNodeAttr("axis", MakeValue(0), concat_op);
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), concat_op);
return concat_op;
}
AnfNodePtr AddConcatD(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
MS_EXCEPTION_IF_NULL(graph);
MS_EXCEPTION_IF_NULL(node1);
MS_EXCEPTION_IF_NULL(node2);
// concatd --> concat
std::vector<AnfNodePtr> concat_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimConcat->name())), node1,
node2};
auto concat_op = graph->NewCNode(concat_inputs);
MS_EXCEPTION_IF_NULL(concat_op);
std::vector<size_t> input = {AnfAlgo::GetOutputInferShape(node1, 0)[0], AnfAlgo::GetOutputInferShape(node1, 0)[1],
AnfAlgo::GetOutputInferShape(node1, 0)[2] + AnfAlgo::GetOutputInferShape(node2, 0)[2]};
auto types = {AnfAlgo::GetOutputInferDataType(node1, 0)};
auto shapes = {input};
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, concat_op.get());
AnfAlgo::SetNodeAttr(kAttrN, MakeValue(2), concat_op);
AnfAlgo::SetNodeAttr(kAttrDynInputSizes, MakeValue(std::vector<int>{2}), concat_op);
AnfAlgo::SetNodeAttr("axis", MakeValue(2), concat_op);
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), concat_op);
return concat_op;
}
AnfNodePtr AddDwReduceSum(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
// node1 : dynamic output
// node2 : matmul
MS_EXCEPTION_IF_NULL(graph);
MS_EXCEPTION_IF_NULL(node1);
MS_EXCEPTION_IF_NULL(node2);
std::vector<AnfNodePtr> ori_outputs;
CreateMultipleOutputsOfAnfNode(graph, node1, 5, &ori_outputs);
// ReduceSumd
std::vector<AnfNodePtr> reducesum_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimReduceSum->name())),
node2};
auto reduce_sumd = graph->NewCNode(reducesum_inputs);
MS_EXCEPTION_IF_NULL(reduce_sumd);
auto types = {AnfAlgo::GetOutputInferDataType(ori_outputs[0], 0)};
auto shapes = {AnfAlgo::GetOutputInferShape(ori_outputs[0], 0)};
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, reduce_sumd.get());
AnfAlgo::SetNodeAttr(kAttrAxis, MakeValue(std::vector<int>{0}), reduce_sumd);
AnfAlgo::SetNodeAttr("keep_dims", MakeValue(false), reduce_sumd);
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), reduce_sumd);
return reduce_sumd;
}
AnfNodePtr AddDbReduceSum(const FuncGraphPtr &graph, const AnfNodePtr &node1, const AnfNodePtr &node2) {
// node1 lstm output
// node2 // dynamic output
MS_EXCEPTION_IF_NULL(graph);
MS_EXCEPTION_IF_NULL(node1);
MS_EXCEPTION_IF_NULL(node2);
std::vector<AnfNodePtr> ori_outputs;
CreateMultipleOutputsOfAnfNode(graph, node2, 5, &ori_outputs);
// ReduceSumd --> ReduceSum
std::vector<AnfNodePtr> reducerum_inputs = {NewValueNode(std::make_shared<Primitive>(prim::kPrimReduceSum->name())),
node1};
auto reduce_sumd = graph->NewCNode(reducerum_inputs);
MS_EXCEPTION_IF_NULL(reduce_sumd);
auto types = {AnfAlgo::GetOutputInferDataType(ori_outputs[1], 0)};
auto shapes = {AnfAlgo::GetOutputInferShape(ori_outputs[1], 0)};
AnfAlgo::SetOutputInferTypeAndShape(types, shapes, reduce_sumd.get());
AnfAlgo::SetNodeAttr(kAttrAxis, MakeValue(std::vector<int>{0, 1}), reduce_sumd);
AnfAlgo::SetNodeAttr("keep_dims", MakeValue(false), reduce_sumd);
AnfAlgo::SetNodeAttr("is_backend_insert", MakeValue(true), reduce_sumd);
return reduce_sumd;
}
const AnfNodePtr DynamicRNNGradFission::Process(const FuncGraphPtr &func_graph, const AnfNodePtr &node,
const EquivPtr &) const {
MS_EXCEPTION_IF_NULL(func_graph);
MS_EXCEPTION_IF_NULL(node);
auto cnode = node->cast<CNodePtr>();
MS_EXCEPTION_IF_NULL(cnode);
if (cnode->size() < kDynamicRNNGradInputNum + 1) {
MS_LOG(INFO) << "The input num of DynamicRNNGrad less than" << kDynamicRNNGradInputNum
<< ". The node should not be changed";
return nullptr;
}
// input_list of dynamic_rnn_grad
const auto &ori_inputs = cnode->inputs();
// create split_vd
auto split_vd = CreateSplitVD(func_graph, ori_inputs[7]);
// create concat_1
auto h_concat = AddHConcatD(func_graph, ori_inputs[5], split_vd);
// create concat_2
auto concat = AddConcatD(func_graph, ori_inputs[1], h_concat);
// create lsym_input_grad
auto lstm_input_grad = CreateLSTMInputGrad(func_graph, cnode);
std::vector<AnfNodePtr> lstm_outputs;
CreateMultipleOutputsOfAnfNode(func_graph, lstm_input_grad, kLSTMInputGradOutputNum, &lstm_outputs);
// create matmul
auto batch_matmul = CreateBatchMatMul(func_graph, lstm_outputs[3], concat);
// create reduce_sum_1
auto dw_reduce_sum = AddDwReduceSum(func_graph, node, batch_matmul);
// create reduce_sum_2
auto db_reduce_sum = AddDbReduceSum(func_graph, lstm_outputs[3], node);
std::vector<AnfNodePtr> make_tuple_inputs = {NewValueNode(prim::kPrimMakeTuple),
dw_reduce_sum,
db_reduce_sum,
lstm_outputs[0],
lstm_outputs[1],
lstm_outputs[2]};
auto make_tuple = func_graph->NewCNode(make_tuple_inputs);
MS_EXCEPTION_IF_NULL(make_tuple);
return make_tuple;
}
} // namespace opt
} // namespace mindspore

View File

@ -0,0 +1,33 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_IR_FISSION_DYNAMIC_RNN_GRAD_FISSION_H_
#define MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_IR_FISSION_DYNAMIC_RNN_GRAD_FISSION_H_
#include "backend/optimizer/common/optimizer.h"
namespace mindspore {
namespace opt {
class DynamicRNNGradFission : public PatternProcessPass {
public:
explicit DynamicRNNGradFission(bool multigraph = true) : PatternProcessPass("dynamic_rnn_grad_fission", multigraph) {}
~DynamicRNNGradFission() override = default;
const BaseRef DefinePattern() const override;
const AnfNodePtr Process(const FuncGraphPtr &, const AnfNodePtr &, const EquivPtr &) const override;
};
} // namespace opt
} // namespace mindspore
#endif // MINDSPORE_CCSRC_BACKEND_OPTIMIZER_ASCEND_IR_FISSION_DYNAMIC_RNN_GRAD_FISSION_H_

View File

@ -21,11 +21,10 @@
namespace mindspore {
namespace opt {
const std::set<std::pair<string, string>> invalid_formats_pair = {{kOpFormat_C1HWNCoC0, kOpFormat_NCHW},
{kOpFormat_NCHW, kOpFormat_C1HWNCoC0},
{kOpFormat_C1HWNCoC0, kOpFormat_DEFAULT},
{kOpFormat_DEFAULT, kOpFormat_FRACTAL_ZN_LSTM},
{kOpFormat_DEFAULT, kOpFormat_C1HWNCoC0}};
const std::set<std::pair<string, string>> invalid_formats_pair = {
{kOpFormat_C1HWNCoC0, kOpFormat_NCHW}, {kOpFormat_NCHW, kOpFormat_C1HWNCoC0},
{kOpFormat_C1HWNCoC0, kOpFormat_DEFAULT}, {kOpFormat_DEFAULT, kOpFormat_FRACTAL_ZN_LSTM},
{kOpFormat_FRACTAL_ZN_LSTM, kOpFormat_DEFAULT}, {kOpFormat_DEFAULT, kOpFormat_C1HWNCoC0}};
bool TransDataSplit::Run(const FuncGraphPtr &func_graph) {
MS_EXCEPTION_IF_NULL(func_graph);
@ -83,6 +82,9 @@ bool TransDataSplit::DoSplit(const FuncGraphPtr &func_graph, const AnfNodePtr &n
new_transpose_node = NewTransOpNode(func_graph, AnfAlgo::GetInputNode(node->cast<CNodePtr>(), 0), kernel_select_,
false, prim::kPrimTranspose->name());
AnfAlgo::SetNodeAttr(kAttrPerm, MakeValue(std::vector<int>{2, 3, 1, 0}), new_transpose_node);
if (output_format == kOpFormat_FRACTAL_ZN_LSTM) {
AnfAlgo::SetNodeAttr("nop_op", MakeValue(true), new_transpose_node);
}
RefreshKernelBuildInfo(input_format, kOpFormat_HWCN, new_transpose_node);
// trans hwcn to output_format

View File

@ -404,7 +404,11 @@ bool IsNopNode(const AnfNodePtr &node) {
}
CNodePtr cnode = node->cast<CNodePtr>();
MS_EXCEPTION_IF_NULL(cnode);
if (nop_nodes.find(AnfAlgo::GetCNodeName(cnode)) == nop_nodes.end()) {
bool is_nop_node = false;
if (AnfAlgo::HasNodeAttr("nop_op", cnode)) {
is_nop_node = AnfAlgo::GetNodeAttr<bool>(cnode, "nop_op");
}
if (nop_nodes.find(AnfAlgo::GetCNodeName(cnode)) == nop_nodes.end() && !is_nop_node) {
return false;
}
return true;

View File

@ -26,7 +26,7 @@ namespace mindspore {
namespace memreuse {
enum RefCountType { kDynamicRefCount, kStaticRefCount };
enum NodeType { kCommonNode, kCommunicationNode };
enum KernelRefType { kCommon, kRefNodeInput, kRefNodeOutput, kCommNotReuse, kCommReuse, kSummary };
enum KernelRefType { kCommon, kRefNodeInput, kRefNodeOutput, kCommNotReuse, kCommReuse, kSummary, kAllocated };
static constexpr int kInitIndex = -1;
class KernelRefCount {
public:

View File

@ -76,6 +76,9 @@ bool MemReuseUtil::InitDynamicOutputKernelRef() {
}
} else {
kernel_ref->type_ = kCommon;
if (AnfAlgo::OutputAddrExist(kernel_cnode, output_index)) {
kernel_ref->type_ = kAllocated;
}
}
}
kernel_refs.push_back(kernel_ref);

View File

@ -16,6 +16,7 @@
#include "backend/optimizer/mem_reuse/mem_reuse_allocator.h"
#include "backend/optimizer/mem_reuse/mem_reuse.h"
#include "backend/optimizer/mem_reuse/mem_reuse_checker.h"
#include "utils/ms_context.h"
#ifdef ENABLE_D
#include "runtime/device/ascend/ascend_stream_assign.h"
#endif
@ -32,12 +33,22 @@ void BestFitMemReuse::InitMemReuseInfo(const MemReuseUtil *mem_reuse_util_ptr) {
set_workspace_ptr_list(mem_reuse_util_ptr->total_wk_ref_list());
set_op_ptr_list(mem_reuse_util_ptr->kernel_def_ptr_list());
// check info Correctness
for (auto &tensor : tensor_ptr_list_) {
tensor->size_ = AlignCommonMemorySize(tensor->size_);
auto context_ptr = MsContext::GetInstance();
MS_EXCEPTION_IF_NULL(context_ptr);
std::string device_target = context_ptr->get_param<std::string>(MS_CTX_DEVICE_TARGET);
// CPU device does not need to be aligned
if (device_target != kCPUDevice) {
MS_LOG(DEBUG) << "Align Memory Size";
for (auto &tensor : tensor_ptr_list_) {
tensor->size_ = AlignCommonMemorySize(tensor->size_);
}
// align wk size to 512
for (auto &wk : wk_tensor_list_) {
wk->size_ = AlignCommonMemorySize(wk->size_);
}
}
// align wk size to 512 && refcount == 1
// set wk refcount == 1
for (auto &wk : wk_tensor_list_) {
wk->size_ = AlignCommonMemorySize(wk->size_);
wk->ref_count_ = 1;
}
#ifdef ENABLE_D
@ -155,6 +166,9 @@ void BestFitMemReuse::AssignCommonNodeOutputOffset() {
// get align size for communication op's single input
tensor_desc->size_ = AlignCommunicationMemorySize(tensor_desc->size_);
total_comm_reuse_size += tensor_desc->size_;
} else if (tensor_desc->type_ == kAllocated) {
// no need to alloc allocated memory
continue;
}
auto reusable_membuf_map = GetReusableMembufMap(tensor_desc->size_);

View File

@ -516,6 +516,10 @@ std::vector<size_t> AnfRuntimeAlgorithm::GetInputDeviceShape(const AnfNodePtr &n
if (trans::IsNeedPadding(format, infer_shape.size())) {
infer_shape = trans::PaddingShapeTo4d(infer_shape, GetInputReshapeType(node, input_idx));
}
if (node->isa<CNode>() && GetCNodeName(node) == kExtractImagePatchesOpName) {
auto shape_tmp = {infer_shape[0], infer_shape[3], infer_shape[1], infer_shape[2]};
return trans::TransShapeToDevice(shape_tmp, format);
}
return trans::TransShapeToDevice(infer_shape, format);
}

View File

@ -179,6 +179,7 @@ GraphId AscendSession::CompileGraph(NotNull<FuncGraphPtr> func_graph) {
debugger_->PreExecute(root_graph);
}
#endif
SetSummaryNodes(root_graph.get());
// alloc mem
MemoryAlloc(root_graph.get());
// generate and load task into device

View File

@ -75,6 +75,16 @@ GraphId CPUSession::CompileGraph(const AnfNodePtrList &lst, const AnfNodePtrList
#endif
MS_LOG(INFO) << "Build kernel";
BuildKernel(graph.get());
// Set graph execution order before memory alloc, ensure that memory alloc is according to the reorder graph
auto execution_order = graph->execution_order();
Reorder(&execution_order);
graph->set_execution_order(execution_order);
MS_LOG(INFO) << "Start kernel runtime";
if (!runtime_.Init()) {
MS_LOG(EXCEPTION) << "CPU start kernel runtime failed";
}
MS_LOG(INFO) << "Assign kernel address";
runtime_.AssignKernelAddress(graph.get());
return graph_id;
@ -98,11 +108,8 @@ void CPUSession::RunGraph(const GraphId &graph_id, const std::vector<tensor::Ten
#endif
MS_LOG(INFO) << "Run graph start";
auto execution_order = kernel_graph->execution_order();
Reorder(&execution_order);
bool enable_summary = summary_callback_ != nullptr;
kernel_graph->set_execution_order(execution_order);
NamedSummaryOutputs summary_outputs;
if (enable_summary) {
SetSummaryNodes(kernel_graph.get());

View File

@ -125,7 +125,7 @@ class CreateCommGroupTask : public Task {
void Run() override;
std::string group_name_;
std::vector<uint32_t> ranks_;
bool result_;
bool result_{false};
};
class DestroyCommGroupTask : public Task {
@ -134,7 +134,7 @@ class DestroyCommGroupTask : public Task {
~DestroyCommGroupTask() override = default;
void Run() override;
std::string group_name_;
bool result_;
bool result_{false};
};
class ExitTask : public Task {

View File

@ -64,8 +64,11 @@ void E2eDumpUtil::DumpGPUMemToFile(const std::string &file_path, const std::stri
const ShapeVector &int_shapes, const TypeId &type, size_t slot, Debugger *debugger) {
#ifdef ENABLE_DEBUGGER
auto format = kOpFormat_DEFAULT;
MS_EXCEPTION_IF_NULL(debugger);
DebugServices *debug_services = debugger->debug_services();
MS_EXCEPTION_IF_NULL(debug_services);
TensorLoader *tensor_loader = debug_services->tensor_loader();
MS_EXCEPTION_IF_NULL(tensor_loader);
auto ret = tensor_loader->DumpTensorToFile(original_kernel_name, trans_flag, file_path, format, int_shapes, type,
addr->type_id(), addr->format(), slot);
if (!ret) {

View File

@ -309,7 +309,7 @@ void Debugger::PostDebugOp() {
}
}
std::map<std::pair<uint32_t, uint32_t>, std::string> &Debugger::GetStreamTaskToOpnameMap() {
std::map<std::pair<uint32_t, uint32_t>, std::string> Debugger::GetStreamTaskToOpnameMap() const {
return stream_task_to_opname_;
}
@ -754,7 +754,8 @@ uint64_t BytestoInt64(const std::vector<char> &buffer) {
uint64_t ret;
ret = ((uint64_t)buffer[7] << 56) | ((uint64_t)buffer[6] << 48) | ((uint64_t)buffer[5] << 40) |
((uint64_t)buffer[4] << 32) | (buffer[3] << 24) | (buffer[2] << 16) | (buffer[1] << 8) | buffer[0];
((uint64_t)buffer[4] << 32) | ((uint64_t)buffer[3] << 24) | ((uint64_t)buffer[2] << 16) |
((uint64_t)buffer[1] << 8) | ((uint64_t)buffer[0]);
return ret;
}

View File

@ -94,7 +94,7 @@ class Debugger : public std::enable_shared_from_this<Debugger> {
int32_t step_num() const;
std::map<std::pair<uint32_t, uint32_t>, std::string> &GetStreamTaskToOpnameMap();
std::map<std::pair<uint32_t, uint32_t>, std::string> GetStreamTaskToOpnameMap() const;
// check if any feature that uses the debugger backend is enabled
bool DebuggerBackendEnabled();

View File

@ -79,9 +79,9 @@ std::vector<std::string> ChunkString(std::string str, int graph_size) {
}
std::string buffer;
buffer.resize(chunk_size);
errno_t err = memcpy_s(reinterpret_cast<char *>(buffer.data()), chunk_size, str.data() + size_iter, chunk_size);
auto err = memcpy_s(reinterpret_cast<char *>(buffer.data()), chunk_size, str.data() + size_iter, chunk_size);
if (err != 0) {
MS_LOG(ERROR) << "memcpy_s failed. err code is: " << err;
MS_LOG(EXCEPTION) << "memcpy_s failed. errorno is: " << err;
}
buf.push_back(buffer);
size_iter += CHUNK_SIZE;

View File

@ -255,7 +255,7 @@ class PynativeEliminater : public OptimizerCaller {
MS_LOG(DEBUG) << "Start FillZero";
ValuePtr out = nullptr;
if (value->isa<Int32Imm>()) {
return value;
return MakeValue(value->cast<Int32ImmPtr>()->value());
}
if (value->isa<tensor::Tensor>()) {
@ -298,9 +298,10 @@ class PynativeEliminater : public OptimizerCaller {
if (rep != nullptr) {
if (rep->isa<ValueNode>()) {
auto value_node = rep->cast<ValueNodePtr>();
value_node->set_value(FillZero(value_node->value()));
auto new_value_node = NewValueNode(FillZero(value_node->value()));
new_value_node->set_has_new_value(value_node->has_new_value());
MS_LOG(DEBUG) << "Zeros_like replace ok " << rep->DebugString(4);
return rep;
return new_value_node;
}
}
}
@ -315,9 +316,10 @@ class PynativeEliminater : public OptimizerCaller {
if (rep != nullptr) {
if (rep->isa<ValueNode>()) {
auto value_node = rep->cast<ValueNodePtr>();
value_node->set_value(FillZero(value_node->value()));
auto new_value_node = NewValueNode(FillZero(value_node->value()));
new_value_node->set_has_new_value(value_node->has_new_value());
MS_LOG(DEBUG) << "Zeros_like replace ok 2 " << rep->DebugString(4);
return rep;
return new_value_node;
}
}
}

View File

@ -69,7 +69,7 @@ PYBIND_REGISTER(Tensor, 0, ([](const py::module *m) {
auto &tensor = py::cast<Tensor &>(t);
if (tensor.type() == DataType::DE_STRING) {
py::array res;
tensor.GetDataAsNumpyStrings(&res);
THROW_IF_ERROR(tensor.GetDataAsNumpyStrings(&res));
return res;
}
py::buffer_info info;

View File

@ -30,6 +30,8 @@ namespace mindspore {
namespace dataset {
std::unique_ptr<Services> Services::instance_ = nullptr;
std::once_flag Services::init_instance_flag_;
std::set<std::string> Services::unique_id_list_ = {};
std::mutex Services::unique_id_mutex_;
#if !defined(_WIN32) && !defined(_WIN64) && !defined(__ANDROID__) && !defined(ANDROID)
std::string Services::GetUserName() {
@ -52,8 +54,23 @@ std::string Services::GetUniqueID() {
std::mt19937 gen = GetRandomDevice();
std::uniform_int_distribution<uint32_t> dist(0, kStr.size() - 1);
char buffer[UNIQUEID_LEN];
for (int i = 0; i < UNIQUEID_LEN; i++) {
buffer[i] = kStr[dist(gen)];
{
std::unique_lock<std::mutex> lock(unique_id_mutex_);
while (true) {
auto ret = memset_s(buffer, UNIQUEID_LEN, 0, UNIQUEID_LEN);
if (ret != 0) {
MS_LOG(ERROR) << "memset_s error, errorno(" << ret << ")";
return std::string("");
}
for (int i = 0; i < UNIQUEID_LEN; i++) {
buffer[i] = kStr[dist(gen)];
}
if (unique_id_list_.find(std::string(buffer, UNIQUEID_LEN)) != unique_id_list_.end()) {
continue;
}
unique_id_list_.insert(std::string(buffer, UNIQUEID_LEN));
break;
}
}
return std::string(buffer, UNIQUEID_LEN);
}

View File

@ -19,6 +19,7 @@
#include <algorithm>
#include <memory>
#include <mutex>
#include <set>
#include <string>
#include <vector>
#include "minddata/dataset/util/memory_pool.h"
@ -97,6 +98,8 @@ class Services {
private:
static std::once_flag init_instance_flag_;
static std::unique_ptr<Services> instance_;
static std::set<std::string> unique_id_list_;
static std::mutex unique_id_mutex_;
// A small pool used for small objects that last until the
// Services Manager shuts down. Used by all sub-services.
std::shared_ptr<MemoryPool> pool_;

View File

@ -1790,38 +1790,92 @@ bool UpdateFuncGraphFlags(py::object obj, const FuncGraphPtr &func_graph) {
return true;
}
// Generate and copy a ValueNode, or a CNode with its child nodes
static AnfNodePtr CopyNodesFromParamDefaultValue(const FuncGraphPtr func_graph, const AnfNodePtr &param_node) {
MS_EXCEPTION_IF_NULL(param_node);
if (param_node->isa<ValueNode>()) {
return std::make_shared<ValueNode>(param_node->cast<ValueNodePtr>()->value());
}
// Parameter default value is CNode.
std::size_t index = 0;
std::vector<AnfNodePtr> old_cnodes;
old_cnodes.emplace_back(param_node);
auto res = func_graph->NewCNode({});
std::vector<CNodePtr> new_cnodes;
new_cnodes.emplace_back(res);
while (index < old_cnodes.size()) {
auto current = old_cnodes[index];
auto current_new_cnode = new_cnodes[index];
index++;
MS_EXCEPTION_IF_NULL(current);
if (current->isa<CNode>()) {
auto &inputs = current->cast<CNodePtr>()->inputs();
for (auto it = inputs.begin(); it != inputs.end(); it++) {
AnfNodePtr input = *it;
if (input != nullptr && input->isa<CNode>()) {
old_cnodes.emplace_back(input);
auto new_cnode = func_graph->NewCNode({});
new_cnodes.emplace_back(new_cnode);
current_new_cnode->add_input(new_cnode);
} else if (input->isa<ValueNode>()) {
current_new_cnode->add_input(std::make_shared<ValueNode>(input->cast<ValueNodePtr>()->value()));
} else {
MS_LOG(EXCEPTION) << "Wrong type item in default parameters: " << input->ToString();
}
}
}
}
return res;
}
FuncGraphPtr MakeTopGraph(const py::object &cell, const ValuePtr &cell_ptr) {
auto current_graph = dyn_cast<FuncGraph>(cell_ptr);
if (current_graph == nullptr) {
MS_LOG(EXCEPTION) << "Current graph cast failed from " << cell_ptr->ToString();
}
auto func_graph = std::make_shared<FuncGraph>();
func_graph->debug_info()->set_name("top");
func_graph->debug_info()->set_name(current_graph->debug_info()->name() + "_wrapper");
// def top(*arg, *kwargs):
auto param_vargs = func_graph->add_parameter();
auto args_name = "args";
param_vargs->set_name(args_name);
param_vargs->debug_info()->set_name(args_name);
auto param_vkwargs = func_graph->add_parameter();
args_name = "kwargs";
param_vkwargs->set_name(args_name);
param_vkwargs->debug_info()->set_name(args_name);
func_graph->set_has_vararg(true);
func_graph->set_has_kwarg(true);
func_graph->set_kwonlyargs_count(0);
// Copy all parameters information
for (auto &para : current_graph->parameters()) {
auto param = func_graph->add_parameter();
auto orig_param = para->cast<ParameterPtr>();
auto name = orig_param->name();
param->set_name(name);
param->debug_info()->set_name(name);
}
func_graph->set_has_vararg(current_graph->has_vararg());
func_graph->set_has_kwarg(current_graph->has_kwarg());
func_graph->set_kwonlyargs_count(current_graph->kwonlyargs_count());
// Copy all default values
for (auto &d : current_graph->parameter_default_value()) {
func_graph->set_param_default_value(d.first, CopyNodesFromParamDefaultValue(func_graph, d.second));
}
// cell_obj
MS_LOG(DEBUG) << "add Flag for " << std::string(py::str(cell));
parse::UpdateFuncGraphFlags(cell, func_graph);
// top graph's construct flag
if (py::hasattr(cell, "construct")) {
parse::UpdateFuncGraphFlags(cell.attr("construct"), func_graph);
}
// ret = cell_obj(*arg, *kwargs)
auto call_fn = MakeUnpackCall(func_graph, NewValueNode(cell_ptr), {param_vargs, param_vkwargs});
// return ret
func_graph->set_output(call_fn);
MS_LOG(DEBUG) << "add Flag for " << std::string(py::str(cell));
auto unpacking = func_graph->has_vararg() || func_graph->has_kwarg();
if (!unpacking) {
std::vector<AnfNodePtr> inputs;
inputs.emplace_back(NewValueNode(cell_ptr));
auto &params = func_graph->parameters();
(void)std::transform(params.begin(), params.end(), std::back_inserter(inputs),
[](AnfNodePtr node) -> AnfNodePtr { return node; });
func_graph->set_output(func_graph->NewCNode(inputs));
} else {
// ret = cell_obj(*arg, *kwargs)
auto call_fn = MakeUnpackCall(func_graph, NewValueNode(cell_ptr), func_graph->parameters());
// return ret
func_graph->set_output(call_fn);
}
return func_graph;
}
} // namespace parse

View File

@ -260,7 +260,7 @@ py::object DoAutoCast(const py::object &arg, const TypeId &type_id) {
py::object DoParamMixPrecisionCast(bool *is_cast, const py::object obj) {
auto tensor = py::cast<tensor::TensorPtr>(obj);
auto cast_type = tensor->cast_dtype();
py::object cast_output;
py::object cast_output = obj;
if (cast_type != nullptr) {
auto source_element = tensor->Dtype();
if (source_element != nullptr && IsSubType(source_element, kFloat) && *source_element != *cast_type) {
@ -282,12 +282,15 @@ py::object DoParamMixPrecisionCastTuple(bool *is_cast, const py::tuple tuple) {
result[i] = DoParamMixPrecisionCast(is_cast, tuple[i]);
} else if (py::isinstance<py::tuple>(tuple[i])) {
result[i] = DoParamMixPrecisionCastTuple(is_cast, tuple[i]);
} else {
result[i] = tuple[i];
}
}
return result;
}
bool GetSignatureType(const PrimitivePyPtr &prim, std::vector<SignatureEnumDType> *dtypes) {
MS_EXCEPTION_IF_NULL(dtypes);
auto signature = prim->signatures();
bool has_sig_dtype = false;
(void)std::transform(signature.begin(), signature.end(), std::back_inserter(*dtypes),
@ -733,20 +736,29 @@ ValuePtr PynativeExecutor::GetForwardValue(const OpExecInfoPtr &op_exec_info) {
AnfNodePtr PynativeExecutor::MakeCNode(const OpExecInfoPtr &op_exec_info, std::vector<bool> *op_masks,
abstract::AbstractBasePtrList *args_spec_list) {
MS_EXCEPTION_IF_NULL(op_masks);
MS_EXCEPTION_IF_NULL(args_spec_list);
CNodePtr cnode = nullptr;
std::vector<AnfNodePtr> inputs;
auto prim = op_exec_info->py_primitive;
const auto &signature = prim->signatures();
inputs.push_back(NewValueNode(prim));
size_t size = op_exec_info->op_inputs.size();
auto sig_size = signature.size();
// ignore signature for cast op
if (sig_size > 0 && sig_size != size) {
MS_EXCEPTION(ValueError) << op_exec_info->op_name << " inputs size " << size << " does not match the requires "
<< "inputs size " << sig_size;
}
bool is_cast_op = (op_exec_info->op_name == "Cast");
if (!is_cast_op) {
const auto &signature = prim->signatures();
for (size_t i = 0; i < size; i++) {
auto obj = op_exec_info->op_inputs[i];
auto sig = SignatureEnumRW::kRWDefault;
if (signature.size() > 0) {
if (sig_size > 0) {
sig = signature[i].rw;
}
MS_LOG(DEBUG) << "check mix precision " << op_exec_info->op_name << " input " << i << " "
@ -1173,10 +1185,6 @@ py::tuple RunOp(const py::args &args) {
std::ostringstream oss;
trace::TraceGraphEval();
trace::GetEvalStackInfo(oss);
// call py::print to output function call stack to STDOUT, in case of output the log to file, the user can see
// these info from screen, no need to open log file to find these info
py::print(oss.str());
MS_LOG(ERROR) << oss.str();
PynativeExecutor::GetInstance()->Clean();
// re-throw this exception to Python interpreter to handle it
throw(py::error_already_set(ex));

View File

@ -406,8 +406,10 @@ void GPUProfiler::OpDataProducerBegin(const std::string op_name, void *stream) {
CHECK_CUDA_RET_WITH_ERROR(cudaEventRecord(op_event_start_, (CUstream)stream_),
"cudaEventRecord op event start failed");
op_host_time_start_ = GetHostTimeStamp();
op_cupti_time_start_ = GetCUPTITimeStamp();
} else {
op_host_time_start_ = GetHostTimeStamp();
op_cupti_time_start_ = GetCUPTITimeStamp();
}
SetRunTimeData(op_name, stream);
}
@ -431,7 +433,7 @@ void GPUProfiler::OpDataProducerEnd() {
}
MS_LOG(DEBUG) << "Host Time Elapsed(us)," << op_name_ << "," << op_time_elapsed;
SetRunTimeData(op_name_, op_time_elapsed);
SetRunTimeData(op_name_, op_host_time_start_, op_time_elapsed);
SetRunTimeData(op_name_, op_cupti_time_start_, op_time_elapsed);
}
void GPUProfiler::StopCUPTI() {

View File

@ -170,6 +170,7 @@ class GPUProfiler {
cudaEvent_t op_event_stop_;
uint64_t op_host_time_start_;
uint64_t op_host_time_stop_;
uint64_t op_cupti_time_start_;
std::string profile_data_path_;
};
} // namespace gpu

View File

@ -455,7 +455,7 @@ REGISTER_PYBIND_DEFINE(Tensor, ([](const py::module *m) {
>>> data.set_dtype(mindspore.int32)
mindspore.int32
)mydelimiter")
.def("set_cast_dtype", &Tensor::set_cast_dtype)
.def("set_cast_dtype", &Tensor::set_cast_dtype, py::arg("dtype") = nullptr)
.def("__str__", &Tensor::ToString)
.def("__repr__", &Tensor::ToStringRepr)
.def(py::pickle(

View File

@ -14,12 +14,12 @@ endif ()
if (ENABLE_CPU)
file(GLOB_RECURSE CPU_SRC_LIST RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "cpu/*.cc")
list(REMOVE_ITEM CPU_SRC_LIST "cpu/mpi/mpi_adapter.cc", "cpu/mpi/mpi_export.cc")
list(REMOVE_ITEM CPU_SRC_LIST "cpu/mpi/mpi_adapter.cc" "cpu/mpi/mpi_export.cc")
endif ()
if (ENABLE_MPI)
if (ENABLE_CPU)
file(GLOB_RECURSE MPI_SRC_LIST "cpu/mpi/mpi_adapter.cc", "cpu/mpi/mpi_export.cc")
file(GLOB_RECURSE MPI_SRC_LIST "cpu/mpi/mpi_adapter.cc" "cpu/mpi/mpi_export.cc")
set_property(SOURCE ${MPI_SRC_LIST}
PROPERTY COMPILE_DEFINITIONS SUBMODULE_ID=mindspore::SubModuleId::SM_DEVICE)
add_library(mpi_adapter SHARED ${MPI_SRC_LIST})

View File

@ -57,7 +57,6 @@ constexpr const char *kOpTypeOpDebug = "Opdebug";
namespace mindspore {
namespace device {
namespace ascend {
DataDumper::~DataDumper() {
ReleaseDevMem(&dev_load_mem_);
ReleaseDevMem(&dev_unload_mem_);
@ -116,7 +115,7 @@ void DataDumper::LoadDumpInfo() {
auto debugger = mindspore::Debugger::GetInstance();
MS_EXCEPTION_IF_NULL(debugger);
if (debugger->DebuggerBackendEnabled()) {
std::map<std::pair<uint32_t, uint32_t>, std::string> &stream_task_to_opname = debugger->GetStreamTaskToOpnameMap();
std::map<std::pair<uint32_t, uint32_t>, std::string> stream_task_to_opname = debugger->GetStreamTaskToOpnameMap();
// extract stream id, task id and opname from runtime_info_map for overflow detection
std::transform(runtime_info_map_.begin(), runtime_info_map_.end(),
std::inserter(stream_task_to_opname, stream_task_to_opname.end()),

View File

@ -52,8 +52,12 @@ const int kUnSupportMixedDataTypeIndex = -1;
bool MatchInferOutputDataType(const CNodePtr &cnode, const kernel::KernelBuildInfo &kernel_build_info) {
MS_EXCEPTION_IF_NULL(cnode);
// Check input data type
auto name = AnfAlgo::GetCNodeName(cnode);
for (size_t input_index = 0; input_index < kernel_build_info.GetInputNum(); ++input_index) {
TypeId input_origin_type = AnfAlgo::GetPrevNodeOutputInferDataType(cnode, input_index);
if (name == kDynamicRNNOpName && input_origin_type == kMetaTypeNone) {
continue;
}
if (kernel_build_info.GetInputDeviceType(input_index) != input_origin_type) {
return false;
}
@ -478,6 +482,9 @@ void SetTensorDeviceInfo(const kernel::KernelBuildInfo &selected_kernel_info, co
AnfAlgo::SetSelectKernelBuildInfo(builder->Build(), input_kernel_node.get());
continue;
}
if (selected_kernel_info.GetInputFormat(input_index) == kOpFormat_FRACTAL_ZN_LSTM) {
continue;
}
// we set special device info of a input tensor.
bool is_ref = false;
auto op_info = kernel::OpLib::FindOp(AnfAlgo::GetCNodeName(kernel_node), kernel::kTBE);

View File

@ -127,8 +127,12 @@ bool TaskGenerator::LaunchKernel(const CNodePtr &anf_node_ptr, uint32_t stream_i
auto kernel_mod = AnfAlgo::GetKernelMod(anf_node_ptr);
MS_EXCEPTION_IF_NULL(kernel_mod);
kernel_mod->set_kernel_name(anf_node_ptr->fullname_with_scope());
auto op_name = AnfAlgo::GetCNodeName(anf_node_ptr);
if (AnfAlgo::GetCNodeName(anf_node_ptr) != kAtomicAddrCleanOpName) {
for (size_t i = 0; i < AnfAlgo::GetInputTensorNum(anf_node_ptr); ++i) {
if (op_name == kDynamicRNNOpName && i == 3) {
continue;
}
auto real_input_index = AnfAlgo::GetRealInputIndex(anf_node_ptr, i);
auto device_address = AnfAlgo::GetPrevNodeOutputAddr(anf_node_ptr, real_input_index);
AddressPtr input = std::make_shared<Address>();

View File

@ -22,22 +22,68 @@
#include <functional>
#include "backend/kernel_compiler/kernel.h"
#include "runtime/device/cpu/cpu_device_address.h"
#include "runtime/device/cpu/cpu_memory_manager.h"
#include "utils/ms_context.h"
#include "backend/session/anf_runtime_algorithm.h"
#include "backend/session/session_basic.h"
#include "frontend/operator/ops.h"
#include "utils/shape_utils.h"
#include "utils/profile.h"
#ifdef MEM_REUSE_DEBUG
#include "backend/optimizer/mem_reuse/mem_reuse_checker.h"
#endif
namespace mindspore {
namespace device {
namespace cpu {
bool CPUKernelRuntime::Init()
{
mem_manager_ = std::make_shared<CPUMemoryManager>();
MS_EXCEPTION_IF_NULL(mem_manager_);
return true;
}
const size_t INIT_NODE_REF = 1;
void CPUKernelRuntime::AssignKernelAddress(session::KernelGraph *kernel_graph) {
AssignValueNodeAddress(kernel_graph);
AssignInputNodeAddress(kernel_graph);
AssignKernelOutputAddress(kernel_graph);
resource_manager_.AssignMemory(kernel_graph);
auto context_ptr = MsContext::GetInstance();
MS_EXCEPTION_IF_NULL(context_ptr);
bool is_enable_mem_reuse = context_ptr->get_param<bool>(MS_CTX_ENABLE_MEM_REUSE);
if (is_enable_mem_reuse) {
AssignStaticMemoryOutput(kernel_graph);
AssignDynamicMemory(kernel_graph);
#ifdef MEM_REUSE_DEBUG
mindspore::memreuse::MemReuseChecker::GetInstance().CheckNormalIR(kernel_graph);
#endif
} else {
AssignKernelOutputAddress(kernel_graph);
static_cast<CPUMemoryManager *>(mem_manager_.get())->AssignMemory(kernel_graph);
}
}
void CPUKernelRuntime::AssignStaticMemoryOutput(const session::KernelGraph *graph) {
MS_EXCEPTION_IF_NULL(graph);
auto nodes = AnfAlgo::GetAllOutput(graph->output(), {prim::kPrimTupleGetItem});
std::vector<session::KernelWithIndex> non_communication_op;
// Assign Communicate Op Memory firstly.
for (const auto &node : nodes) {
auto item_with_index = AnfAlgo::VisitKernelWithReturnType(node, 0, true);
MS_EXCEPTION_IF_NULL(item_with_index.first);
if (!item_with_index.first->isa<CNode>() || !AnfAlgo::IsRealKernel(item_with_index.first)) {
continue;
}
if (AnfAlgo::IsCommunicationOp(item_with_index.first)) {
AssignCommunicationNodeMem(kStaticMem, item_with_index.first);
} else {
non_communication_op.emplace_back(item_with_index);
}
}
for (const auto &item_with_index : non_communication_op) {
AssignNodeOutputMem(kStaticMem, item_with_index.first, SizeToInt(item_with_index.second));
}
}
void CPUKernelRuntime::AssignValueNodeAddress(session::KernelGraph *kernel_graph) {
@ -73,7 +119,8 @@ void CPUKernelRuntime::AssignValueNodeAddress(session::KernelGraph *kernel_graph
tensor->data_type() == kNumberTypeInt64) {
address->ptr_ = tensor->data_c();
} else {
address->ptr_ = resource_manager_.MemMalloc(tensor_size);
MS_EXCEPTION_IF_NULL(mem_manager_);
address->ptr_ = static_cast<CPUMemoryManager *>(mem_manager_.get())->MemMalloc(tensor_size);
if (!address->SyncHostToDevice(data_shape, LongToSize(tensor->data().nbytes()), tensor->data_type(),
tensor->data_c())) {
MS_LOG(EXCEPTION) << "Value node sync host to device failed!";
@ -166,10 +213,11 @@ tensor::TensorPtr CPUKernelRuntime::CreatTensorForOutput(session::KernelGraph *k
tensor->set_sync_status(kNeedSyncDeviceToHostImmediately);
} else {
if (infer_type_id != device_type_id) {
MS_EXCEPTION_IF_NULL(mem_manager_);
size_t type_size = GetTypeByte(TypeIdToType(device_type_id));
ShapeVector data_shape = tensor->shape();
size_t tensor_size = std::accumulate(data_shape.begin(), data_shape.end(), type_size, std::multiplies<size_t>());
address->ptr_ = resource_manager_.MemMalloc(tensor_size);
address->ptr_ = static_cast<CPUMemoryManager *>(mem_manager_.get())->MemMalloc(tensor_size);
tensor->set_device_address(address);
tensor->set_sync_status(kNeedSyncDeviceToHostImmediately);
} else {
@ -241,10 +289,11 @@ void CPUKernelRuntime::BindInputOutput(session::KernelGraph *kernel_graph, const
tensor->data_type() == kNumberTypeInt32 || tensor->data_type() == kNumberTypeInt64) {
address->ptr_ = tensor->data_c();
} else {
MS_EXCEPTION_IF_NULL(mem_manager_);
ShapeVector data_shape = tensor->shape();
size_t tensor_size =
std::accumulate(data_shape.begin(), data_shape.end(), sizeof(float), std::multiplies<size_t>());
address->ptr_ = resource_manager_.MemMalloc(tensor_size);
address->ptr_ = static_cast<CPUMemoryManager *>(mem_manager_.get())->MemMalloc(tensor_size);
if (!address->SyncHostToDevice(data_shape, LongToSize(tensor->data().nbytes()), tensor->data_type(),
tensor->data_c())) {
MS_LOG(EXCEPTION) << "Parameter node sync host to device failed!";
@ -271,7 +320,8 @@ void CPUKernelRuntime::AddRuntimeAddress(DeviceAddress *address, std::vector<ker
kernel::AddressPtr input = std::make_shared<kernel::Address>();
MS_EXCEPTION_IF_NULL(input);
if (address->ptr_ == nullptr) {
address->ptr_ = resource_manager_.MemMalloc(address->size_);
MS_EXCEPTION_IF_NULL(mem_manager_);
address->ptr_ = static_cast<CPUMemoryManager *>(mem_manager_.get())->MemMalloc(address->size_);
}
MS_EXCEPTION_IF_NULL(address->ptr_);
input->addr = address->ptr_;
@ -280,16 +330,19 @@ void CPUKernelRuntime::AddRuntimeAddress(DeviceAddress *address, std::vector<ker
}
void CPUKernelRuntime::IncreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
resource_manager_.IncreaseSummaryRefCount(summary_outputs);
MS_EXCEPTION_IF_NULL(mem_manager_);
static_cast<CPUMemoryManager *>(mem_manager_.get())->IncreaseSummaryRefCount(summary_outputs);
}
void CPUKernelRuntime::DecreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
resource_manager_.DecreaseSummaryRefCount(summary_outputs);
MS_EXCEPTION_IF_NULL(mem_manager_);
static_cast<CPUMemoryManager *>(mem_manager_.get())->DecreaseSummaryRefCount(summary_outputs);
}
bool CPUKernelRuntime::Run(session::KernelGraph *kernel_graph, bool is_task_sink, Debugger *debugger) {
MS_EXCEPTION_IF_NULL(kernel_graph);
resource_manager_.IncreaseAddressRefCount(kernel_graph);
MS_EXCEPTION_IF_NULL(mem_manager_);
static_cast<CPUMemoryManager *>(mem_manager_.get())->IncreaseAddressRefCount(kernel_graph);
auto kernels = kernel_graph->execution_order();
for (const auto &kernel : kernels) {
@ -319,7 +372,7 @@ bool CPUKernelRuntime::Run(session::KernelGraph *kernel_graph, bool is_task_sink
AddRuntimeAddress(device_address, &kernel_workspaces);
}
auto ret = kernel_mod->Launch(kernel_inputs, kernel_workspaces, kernel_outputs, 0);
resource_manager_.DecreaseAddressRefCount(kernel);
static_cast<CPUMemoryManager *>(mem_manager_.get())->DecreaseAddressRefCount(kernel);
if (!ret) {
MS_LOG(EXCEPTION) << "Launch kernel failed.";
}

View File

@ -24,7 +24,6 @@
#include "runtime/device/kernel_runtime.h"
#include "backend/session/kernel_graph.h"
#include "backend/session/session_basic.h"
#include "runtime/device/cpu/cpu_resource_manager.h"
#include "backend/session/anf_runtime_algorithm.h"
#include "utils/any.h"
namespace mindspore {
@ -35,7 +34,7 @@ class CPUKernelRuntime : public KernelRuntime {
CPUKernelRuntime() = default;
~CPUKernelRuntime() override = default;
bool Init() override { return true; }
bool Init() override;
bool Run(session::KernelGraph *graph, bool is_task_sink, Debugger *debugger = nullptr) override;
void AssignKernelAddress(session::KernelGraph *kernel_graph);
void BindInputOutput(session::KernelGraph *kernel_graph, const std::vector<tensor::TensorPtr> &inputs,
@ -49,6 +48,7 @@ class CPUKernelRuntime : public KernelRuntime {
TypeId type_id) override;
private:
void AssignStaticMemoryOutput(const session::KernelGraph *graph);
tensor::TensorPtr CreatTensorForOutput(session::KernelGraph *kernel_graph, const CNodePtr &node, size_t index);
BaseRef CreatTensorForOutput(session::KernelGraph *kernel_graph, const session::KernelWithIndex &kernel_with_index);
@ -56,7 +56,6 @@ class CPUKernelRuntime : public KernelRuntime {
void AssignInputNodeAddress(const session::KernelGraph *kernel_graph);
void AssignKernelOutputAddress(const session::KernelGraph *kernel_graph);
void AddRuntimeAddress(DeviceAddress *address, std::vector<kernel::AddressPtr> *input_list);
CPUResourceManager resource_manager_;
std::set<DeviceAddressPtr> bound_addresses_;
std::map<AnfNodePtr, tensor::TensorPtr> input_param_tensor_map_;
};

View File

@ -1,5 +1,5 @@
/**
* Copyright 2019 Huawei Technologies Co., Ltd
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@ -13,15 +13,34 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "runtime/device/cpu/cpu_resource_manager.h"
#include "backend/session/anf_runtime_algorithm.h"
#include "runtime/device/cpu/cpu_memory_manager.h"
#include "backend/session/anf_runtime_algorithm.h"
#include "utils/ms_context.h"
#include "utils/convert_utils.h"
namespace mindspore {
namespace device {
namespace cpu {
CPUResourceManager::~CPUResourceManager() { MemFree(); }
void CPUResourceManager::MemFree() {
uint8_t *CPUMemoryManager::MallocStaticMem(size_t size, bool communication_mem) {
// For cpu, static memory is the same as dynamic memory
return MallocDynamicMem(size, communication_mem);
}
uint8_t *CPUMemoryManager::MallocDynamicMem(size_t size, bool) {
return (uint8_t *)MemMalloc(size);
}
void CPUMemoryManager::ResetDynamicMemory() {
for (auto &&iter : dynamic_mem_) {
free(iter.first);
}
dynamic_mem_.clear();
}
CPUMemoryManager::~CPUMemoryManager() { MemFree(); }
void CPUMemoryManager::MemFree() {
if (mem_ptr_ != nullptr) {
free(mem_ptr_);
mem_ptr_ = nullptr;
@ -34,7 +53,7 @@ void CPUResourceManager::MemFree() {
dynamic_mem_.clear();
}
void CPUResourceManager::AssignMemory(const session::KernelGraph *graph) {
void CPUMemoryManager::AssignMemory(const session::KernelGraph *graph) {
size_t graph_mem_size = mem_plan_.MemPlan(graph);
if (graph_mem_size > mem_size_) {
if (mem_size_ > 0) {
@ -43,6 +62,7 @@ void CPUResourceManager::AssignMemory(const session::KernelGraph *graph) {
}
mem_ptr_ = reinterpret_cast<uint8_t *>(malloc(graph_mem_size));
if (mem_ptr_ != nullptr) {
MS_LOG(INFO) << "Simple MemPlan GraphMemSize [" << graph_mem_size << "]";
mem_size_ = graph_mem_size;
dynamic_malloc_ = false;
} else {
@ -56,7 +76,7 @@ void CPUResourceManager::AssignMemory(const session::KernelGraph *graph) {
mem_plan_.MemAssign(graph, mem_ptr_);
}
void *CPUResourceManager::MemMalloc(size_t mem_size) {
void *CPUMemoryManager::MemMalloc(size_t mem_size) {
void *ptr = malloc(mem_size);
if (ptr != nullptr) {
memset_s(ptr, mem_size, 0, mem_size);
@ -67,7 +87,7 @@ void *CPUResourceManager::MemMalloc(size_t mem_size) {
}
}
void CPUResourceManager::MemFree(void *ptr) {
void CPUMemoryManager::MemFree(void *ptr) {
auto iter = dynamic_mem_.find(ptr);
if (iter != dynamic_mem_.end()) {
(void)dynamic_mem_.erase(iter);
@ -75,7 +95,7 @@ void CPUResourceManager::MemFree(void *ptr) {
}
}
void CPUResourceManager::IncreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
void CPUMemoryManager::IncreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
if (!dynamic_malloc_) {
return;
}
@ -93,7 +113,7 @@ void CPUResourceManager::IncreaseSummaryRefCount(const session::NamedSummaryOutp
}
}
void CPUResourceManager::DecreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
void CPUMemoryManager::DecreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs) {
if (!dynamic_malloc_) {
return;
}
@ -115,7 +135,7 @@ void CPUResourceManager::DecreaseSummaryRefCount(const session::NamedSummaryOutp
}
}
void CPUResourceManager::IncreaseAddressRefCount(const session::KernelGraph *graph) {
void CPUMemoryManager::IncreaseAddressRefCount(const session::KernelGraph *graph) {
if (!dynamic_malloc_) {
return;
}
@ -140,7 +160,7 @@ void CPUResourceManager::IncreaseAddressRefCount(const session::KernelGraph *gra
}
}
void CPUResourceManager::DecreaseAddressRefCount(const AnfNodePtr &kernel) {
void CPUMemoryManager::DecreaseAddressRefCount(const AnfNodePtr &kernel) {
if (!dynamic_malloc_) {
return;
}

View File

@ -1,5 +1,5 @@
/**
* Copyright 2019 Huawei Technologies Co., Ltd
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@ -13,22 +13,27 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_RESOURCE_MANAGER_H_
#define MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_RESOURCE_MANAGER_H_
#ifndef MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_MEMORY_MANAGER_H_
#define MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_MEMORY_MANAGER_H_
#include <vector>
#include <map>
#include "backend/session/kernel_graph.h"
#include "backend/session/session_basic.h"
#include "runtime/device/device_address.h"
#include "runtime/device/memory_manager.h"
#include "runtime/device/cpu/cpu_simple_mem_plan.h"
namespace mindspore {
namespace device {
namespace cpu {
class CPUResourceManager {
class CPUMemoryManager : public MemoryManager {
public:
CPUResourceManager() = default;
~CPUResourceManager();
CPUMemoryManager() = default;
virtual ~CPUMemoryManager();
void MallocDeviceMemory() override {}
void FreeDeviceMemory() override {}
void ResetDynamicMemory() override;
void AssignMemory(const session::KernelGraph *graph);
void IncreaseAddressRefCount(const session::KernelGraph *graph);
@ -38,6 +43,10 @@ class CPUResourceManager {
void IncreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs);
void DecreaseSummaryRefCount(const session::NamedSummaryOutputs &summary_outputs);
protected:
uint8_t *MallocStaticMem(size_t size, bool communication_mem) override;
uint8_t *MallocDynamicMem(size_t size, bool communication_mem) override;
private:
void MemFree();
CPUSimpleMemPlan mem_plan_;
@ -50,5 +59,4 @@ class CPUResourceManager {
} // namespace cpu
} // namespace device
} // namespace mindspore
#endif // MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_RESOURCE_MANAGER_H_
#endif // MINDSPORE_CCSRC_RUNTIME_DEVICE_CPU_CPU_MEMORY_MANAGER_H_

View File

@ -28,7 +28,7 @@ namespace mindspore {
namespace device {
namespace cpu {
class CPUSimpleMemPlan;
class CPUResourceManager;
class CPUMemoryManager;
class CPUKernelRuntime;
} // namespace cpu
namespace ascend {
@ -87,7 +87,7 @@ class DeviceAddress : public mindspore::DeviceSync {
friend class MemoryManager;
friend class mindspore::device::ascend::tasksink::TaskGenerator;
friend class mindspore::device::cpu::CPUSimpleMemPlan;
friend class mindspore::device::cpu::CPUResourceManager;
friend class mindspore::device::cpu::CPUMemoryManager;
friend class mindspore::device::cpu::CPUKernelRuntime;
friend class mindspore::device::gpu::GPUKernelRuntime;
friend class mindspore::device::gpu::GPUMemoryManager;

View File

@ -141,7 +141,6 @@ inline bool CheckNullInput(std::vector<size_t> input_shape) {
MS_LOG(EXCEPTION) << "CUAD curand Error: " << message << " | curandStatus: " << status; \
} \
}
} // namespace gpu
} // namespace device
} // namespace mindspore

View File

@ -25,7 +25,6 @@
namespace mindspore {
namespace device {
namespace gpu {
MPIInitializer &MPIInitializer::GetInstance() {
static MPIInitializer instance;
return instance;

View File

@ -58,7 +58,23 @@ int AnfConverter::ValidateFileStr(const std::string &modelFile, std::string file
bool AnfConverter::ReadOnnxFromBinary(const std::string &modelFile, google::protobuf::Message *onnx_model) {
std::unique_ptr<char> onnx_file(new (std::nothrow) char[PATH_MAX]{0});
int fd = open(modelFile.c_str(), O_RDONLY);
if (modelFile.size() > PATH_MAX) {
MS_LOG(DEBUG) << "file path " << modelFile << " is too long.";
return false;
}
char real_path[PATH_MAX + 1] = {0};
#if defined(_WIN32) || defined(_WIN64)
if (nullptr == _fullpath(real_path, modelFile.c_str(), PATH_MAX)) {
MS_LOG(DEBUG) << modelFile << " does not exit.";
return false;
}
#else
if (nullptr == realpath(modelFile.c_str(), real_path)) {
MS_LOG(DEBUG) << modelFile << " does not exit.";
return false;
}
#endif
int fd = open(real_path, O_RDONLY);
if (fd < 0) {
MS_LOG(EXCEPTION) << "failed to open file";
}

View File

@ -100,28 +100,27 @@ bool ProcessInner(const std::string &message, const int &bias, uint32_t *digest,
w[i] = sigma3(w[i - 2]) + w[i - 7] + sigma2(w[i - 15]) + w[i - 16];
}
std::shared_ptr<uint32_t> hash(new uint32_t[digest_size], std::default_delete<uint32_t[]>());
std::shared_ptr<uint32_t[]> hash(new uint32_t[digest_size]());
size_t mem_size = digest_size * sizeof(uint32_t);
auto ret = memcpy_s(hash.get(), mem_size, digest, mem_size);
if (ret != EOK) {
return false;
}
for (int i = 0; i < kIterationNumber; ++i) {
uint32_t t1 =
w[i] + constant[i] + hash.get()[7] + sigma1(hash.get()[4]) + ch(hash.get()[4], hash.get()[5], hash.get()[6]);
uint32_t t2 = sigma0(hash.get()[0]) + ma(hash.get()[0], hash.get()[1], hash.get()[2]);
uint32_t t1 = w[i] + constant[i] + hash[7] + sigma1(hash[4]) + ch(hash[4], hash[5], hash[6]);
uint32_t t2 = sigma0(hash[0]) + ma(hash[0], hash[1], hash[2]);
for (int j = digest_size - 1; j >= 0; --j) {
if (j == 4) {
hash.get()[j] = hash.get()[j - 1] + t1;
hash[j] = hash[j - 1] + t1;
} else if (j == 0) {
hash.get()[j] = t1 + t2;
hash[j] = t1 + t2;
} else {
hash.get()[j] = hash.get()[j - 1];
hash[j] = hash[j - 1];
}
}
}
for (int i = 0; i < digest_size; ++i) {
digest[i] += hash.get()[i];
digest[i] += hash[i];
}
return true;
}

View File

@ -219,6 +219,8 @@ constexpr auto kFusedScaleApplyMomentum = "FusedScaleApplyMomentum";
constexpr auto kBasicLSTMCellWeightGradOpName = "BasicLSTMCellWeightGrad";
constexpr auto kBasicLSTMCellInputGradOpName = "BasicLSTMCellInputGrad";
constexpr auto kBasicLSTMCellOpName = "BasicLSTMCell";
constexpr auto kDynamicRNNOpName = "DynamicRNN";
constexpr auto kLSTMInputGradOpName = "LSTMInputGrad";
// attr key name
constexpr auto kAttrInputNames = "input_names";

View File

@ -292,7 +292,6 @@ class _PynativeExecutor:
def __init__(self):
self._executor = PynativeExecutor_.get_instance()
#TODO(kpy):add a type arg
def new_graph(self, obj, *args, **kwargs):
self._executor.new_graph(obj, *args, *(kwargs.values()))

View File

@ -30,12 +30,14 @@ def set_seed(seed):
Note:
The global seed is used by numpy.random, mindspore.common.Initializer, mindspore.ops.composite.random_ops and
mindspore.nn.probability.distribution.
mindspore.nn.probability.distribution.
If global seed is not set, these packages will use their own default seed independently, numpy.random and
mindspore.common.Initializer will choose a random seed, mindspore.ops.composite.random_ops and
mindspore.nn.probability.distribution will use zero.
mindspore.common.Initializer will choose a random seed, mindspore.ops.composite.random_ops and
mindspore.nn.probability.distribution will use zero.
Seed set by numpy.random.seed() only used by numpy.random, while seed set by this API will also used by
numpy.random, so just set all seed by this API is recommended.
numpy.random, so just set all seed by this API is recommended.
Args:
seed (int): The seed to be set.
@ -97,14 +99,13 @@ def _get_op_seed(op_seed, kernel_name):
seed (int): The op-seed to be updated.
kernel_name (string): The random op kernel.
"""
if ((kernel_name, op_seed) not in _KERNEL_SEED) or (_KERNEL_SEED[(kernel_name, op_seed)] == -1):
if (kernel_name, op_seed) not in _KERNEL_SEED:
_KERNEL_SEED[(kernel_name, op_seed)] = op_seed
_KERNEL_SEED[(kernel_name, op_seed)] = 0
return _KERNEL_SEED[(kernel_name, op_seed)]
def _reset_op_seed():
"""
Reset op seeds in the kernel's dictionary.
"""
for key in _KERNEL_SEED:
_KERNEL_SEED[key] = -1
for (kernel_name, op_seed) in _KERNEL_SEED:
_KERNEL_SEED[(kernel_name, op_seed)] = op_seed

View File

@ -104,6 +104,9 @@ inline const PrimitivePtr kPrimPad = std::make_shared<Primitive>("Pad");
inline const PrimitivePtr kPrimArgMaxWithValue = std::make_shared<Primitive>("ArgMaxWithValue");
inline const PrimitivePtr kPrimUnique = std::make_shared<Primitive>("Unique");
inline const PrimitivePtr kPrimUniqueGrad = std::make_shared<Primitive>("UniqueGrad");
inline const PrimitivePtr kPrimExtractImagePatches = std::make_shared<Primitive>("ExtractImagePatches");
inline const PrimitivePtr kPrimDynamicRNN = std::make_shared<Primitive>("DynamicRNN");
inline const PrimitivePtr kPrimDynamicRNNGrad = std::make_shared<Primitive>("DynamicRNNGrad");
// NN
inline const PrimitivePtr kPrimFlatten = std::make_shared<Primitive>("Flatten");
@ -213,6 +216,7 @@ inline const PrimitivePtr kPrimRound = std::make_shared<Primitive>("Round");
inline const PrimitivePtr kPrimExp = std::make_shared<Primitive>("Exp");
inline const PrimitivePtr kPrimLog = std::make_shared<Primitive>("Log");
inline const PrimitivePtr kPrimRsqrt = std::make_shared<Primitive>("Rsqrt");
inline const PrimitivePtr kPrimSplitV = std::make_shared<Primitive>("SplitV");
// Statements
inline const PrimitivePtr kPrimReturn = std::make_shared<Primitive>("return");

View File

@ -28,7 +28,6 @@
#include <algorithm>
#include <type_traits>
#include <typeinfo>
#include <regex>
#include "abstract/abstract_value.h"
@ -373,25 +372,17 @@ class TensorDataImpl : public TensorData {
std::is_same<T, double>::value) {
return str;
}
// Use regular expressions to replace placeholders.
std::regex pattern("#+");
std::smatch result;
int bias = 0;
std::string::const_iterator start = str.begin();
std::string::const_iterator end = str.end();
while (std::regex_search(start, end, result, pattern)) {
const int len = result.str(0).length();
const int pos = result.position();
bias += pos;
if (bias > static_cast<int32_t>(str.length())) {
return "";
// Replace # with placeholder.
size_t index = str.find('#');
while (index != str.npos) {
size_t pos = index;
while (str[pos] == '#') {
pos++;
}
// Replace # with placeholder.
int len = pos - index;
std::string space(max_width - len, ' ');
str = str.replace(bias, len, space);
// Update the starting position of the search.
start = str.begin() + bias;
end = str.end();
str = str.replace(index, len, space);
index = str.find('#', index);
}
return str;
}

View File

@ -269,7 +269,7 @@ class Tensor : public MetaTensor {
std::string id() const { return id_; }
TypePtr cast_dtype() { return cast_dtype_; }
void set_cast_dtype(TypePtr dtype) { cast_dtype_ = dtype; }
void set_cast_dtype(TypePtr dtype = nullptr) { cast_dtype_ = dtype; }
void SetNeedWait(bool need_wait) {
if (event_ != nullptr) {

View File

@ -266,7 +266,7 @@ std::string KeywordArg::ToString() const {
buffer << "KeywordArg[";
buffer << "key : " << key_;
MS_EXCEPTION_IF_NULL(value_);
buffer << "value : " << value_->ToString();
buffer << ", value : " << value_->ToString();
buffer << "]";
return buffer.str();
}

View File

@ -2474,7 +2474,7 @@ class ConcatDataset(DatasetOp):
raise ValueError("The parameter num_shards of DistributedSampler must be positive int!")
if sampler.get_num_samples() is not None:
raise ValueError("The parameter num_samples of DistributedSampler must be set!")
raise ValueError("The parameter num_samples of DistributedSampler is not support to be set!")
self._sampler = _select_sampler(None, sampler, None, None, None)
cumulative_samples_nums = 0
@ -3256,14 +3256,13 @@ class SamplerFn:
# Event for end of epoch
if multi_process is True:
self.eoe = multiprocessing.Event()
self.eof = multiprocessing.Event()
else:
self.eoe = threading.Event()
self.eof = threading.Event()
# Create workers
for _ in range(num_worker):
if multi_process is True:
worker = _GeneratorWorkerMp(dataset, self.eoe, self.eof)
worker = _GeneratorWorkerMp(dataset, self.eoe)
else:
worker = _GeneratorWorkerMt(dataset, self.eoe, self.eof)
worker.daemon = True
@ -3304,15 +3303,40 @@ class SamplerFn:
def __del__(self):
self.eoe.set()
self.eof.set()
if self.multi_process is False:
self.eof.set()
for w in self.workers:
w.join()
def _generator_worker_loop(dataset, idx_queue, result_queue, eoe, eof):
def _generator_worker_loop_mp(dataset, idx_queue, result_queue, eoe):
"""
Multiprocessing or multithread generator worker process loop.
Multiprocessing generator worker process loop
"""
while True:
# Fetch index, block
try:
idx = idx_queue.get()
except KeyboardInterrupt:
raise Exception("Generator worker receives KeyboardInterrupt")
if idx is None:
# When the queue is out of scope from master process, a None item can be fetched from the queue.
# Upon receiving None, worker process should check if EOE is set.
assert eoe.is_set(), ""
return
# Fetch data, any exception from __getitem__ will terminate worker and timeout master process
result = dataset[idx]
# Send data, block
try:
result_queue.put(result)
except KeyboardInterrupt:
raise Exception("Generator worker receives KeyboardInterrupt")
del result, idx
def _generator_worker_loop_mt(dataset, idx_queue, result_queue, eoe, eof):
"""
Multithread generator worker process loop.
"""
while True:
# Fetch index, block
@ -3360,7 +3384,7 @@ class _GeneratorWorkerMt(threading.Thread):
def __init__(self, dataset, eoe, eof):
self.idx_queue = queue.Queue(16)
self.res_queue = queue.Queue(16)
super().__init__(target=_generator_worker_loop, args=(dataset, self.idx_queue, self.res_queue, eoe, eof))
super().__init__(target=_generator_worker_loop_mt, args=(dataset, self.idx_queue, self.res_queue, eoe, eof))
def put(self, item):
"""
@ -3372,7 +3396,7 @@ class _GeneratorWorkerMt(threading.Thread):
"""
Get function for worker result queue. Block with timeout.
"""
return self.res_queue.get(timeout=10)
return self.res_queue.get(timeout=30)
class _GeneratorWorkerMp(multiprocessing.Process):
@ -3380,10 +3404,10 @@ class _GeneratorWorkerMp(multiprocessing.Process):
Worker process for multiprocess Generator.
"""
def __init__(self, dataset, eoe, eof):
def __init__(self, dataset, eoe):
self.idx_queue = multiprocessing.Queue(16)
self.res_queue = multiprocessing.Queue(16)
super().__init__(target=_generator_worker_loop, args=(dataset, self.idx_queue, self.res_queue, eoe, eof))
super().__init__(target=_generator_worker_loop_mp, args=(dataset, self.idx_queue, self.res_queue, eoe))
def put(self, item):
"""
@ -3395,17 +3419,18 @@ class _GeneratorWorkerMp(multiprocessing.Process):
"""
Get function for worker result queue. Block with timeout.
"""
while check_iterator_cleanup() is False:
try:
return self.res_queue.get(timeout=10)
except multiprocessing.TimeoutError:
continue
raise Exception("Generator worker process timeout")
# Relax 10s to 30s, since it sometimes will cause "Generator worker process timeout"
# when we run too many iterators with infinite epoch(num_epoch=-1)
return self.res_queue.get(timeout=30)
def __del__(self):
self.terminate()
# Try to destruct here, sometimes the class itself will be destructed in advance,
# so "self" will be a NoneType
try:
self.terminate()
except AttributeError:
pass
class GeneratorDataset(MappableDataset):

View File

@ -80,12 +80,44 @@ class Compose:
>>> dataset = ds.ImageFolderDataset(dataset_dir, num_parallel_workers=8)
>>> # create a list of transformations to be applied to the image data
>>> transform = py_transforms.Compose([py_vision.Decode(),
>>> py_vision.RandomHorizontalFlip(0.5),
>>> py_vision.ToTensor(),
>>> py_vision.Normalize((0.491, 0.482, 0.447), (0.247, 0.243, 0.262)),
>>> py_vision.RandomErasing()])
>>> py_vision.RandomHorizontalFlip(0.5),
>>> py_vision.ToTensor(),
>>> py_vision.Normalize((0.491, 0.482, 0.447), (0.247, 0.243, 0.262)),
>>> py_vision.RandomErasing()])
>>> # apply the transform to the dataset through dataset.map()
>>> dataset = dataset.map(operations=transform, input_columns="image")
>>>
>>> # Compose is also be invoked implicitly, by just passing in a list of ops
>>> # the above example then becomes:
>>> transform_list = [py_vision.Decode(),
>>> py_vision.RandomHorizontalFlip(0.5),
>>> py_vision.ToTensor(),
>>> py_vision.Normalize((0.491, 0.482, 0.447), (0.247, 0.243, 0.262)),
>>> py_vision.RandomErasing()]
>>>
>>> # apply the transform to the dataset through dataset.map()
>>> dataset = dataset.map(operations=transform_list, input_columns="image")
>>>
>>> # Certain C++ and Python ops can be combined, but not all of them
>>> # An example of combined operations
>>> import mindspore.dataset as ds
>>> import mindspore.dataset.transforms.c_transforms as c_transforms
>>> import mindspore.dataset.vision.c_transforms as c_vision
>>>
>>> data = ds.NumpySlicesDataset(arr, column_names=["cols"], shuffle=False)
>>> transformed_list = [py_transforms.OneHotOp(2), c_transforms.Mask(c_transforms.Relational.EQ, 1)]
>>> data = data.map(operations=transformed_list, input_columns=["cols"])
>>>
>>> # Here is an example of mixing vision ops
>>> data_dir = "/path/to/imagefolder_directory"
>>> data1 = ds.ImageFolderDataset(dataset_dir=data_dir, shuffle=False)
>>> input_columns = ["column_names"]
>>> op_list=[c_vision.Decode(),
>>> c_vision.Resize((224, 244)),
>>> py_vision.ToPIL(),
>>> np.array, # need to convert PIL image to a NumPy array to pass it to C++ operation
>>> c_vision.Resize((24, 24))]
>>> data1 = data1.map(operations=op_list, input_columns=input_columns)
"""
@check_compose_list

View File

@ -28,10 +28,10 @@ if (SUPPORT_TRAIN)
elseif (PLATFORM_ARM32)
set(COMPONENT_NAME runtime-arm32-${PROCESS_UNIT}-train)
elseif (WIN32)
set(PARSER_NAME libconverter-parser-win-${PROCESS_UNIT})
set(COMPONENT_NAME converter-win-${PROCESS_UNIT})
set(PARSER_NAME libconverter-parser-win-${PROCESS_UNIT}-train)
set(COMPONENT_NAME converter-win-${PROCESS_UNIT}-train)
else ()
set(COMPONENT_NAME converter-ubuntu)
set(COMPONENT_NAME converter-ubuntu-train)
endif()
set(RUN_X86_COMPONENT_NAME runtime-x86-${PROCESS_UNIT}-train)
else ()

View File

@ -0,0 +1,113 @@
#ifdef __arm__
#ifndef __aarch64__
.text
.align 5
.global ConvDwInt8PostAlign4PerChannel
#ifndef __APPLE__
.type ConvDwInt8PostAlign4PerChannel, %function
#endif
// void ConvDwInt8PostAlign4PerChannel(int8_t *dst, int32_t *buffer, int channel4, int32_t output_zp, int32_t *out_multiplier,
// int32_t *left_shift, int32_t *right_shift, int32_t acc_min, int32_t acc_max);
// r0: dst, r1: buffer, r2: num_pixels, r3: output_zp, r4: out_multiplier,
// r5: left_shift, r6: right_shift, r7: acc_min, r8: acc_max
ConvDwInt8PostAlign4PerChannel:
// at return, clang generates "push {lr}, pop {pc}"" while gcc will generate "bx lr"
// according to https://stackoverflow.com/questions/53625807
// even if we jump to link register instead of saving it, we still have to save it in subroutine calls anyway
// clang's rule seems more simple, though there are no subroutine calls here
// r4-r8 and q4-q7 must be saved according to https://static.docs.arm.com/ihi0042/i/aapcs32.pdf
push {r4-r8, r10}
vpush {q4-q7}
add sp, sp, #88
vdup.32 q15, r3 // output_zp
ldr r4, [sp] // out_multiplier
ldr r5, [sp, #4] // left_shift
ldr r6, [sp, #8] // right_shift
ldr r7, [sp, #12] // acc_min
vdup.32 q11, r7
ldr r8, [sp, #16] // acc_max
vdup.32 q10, r8
mov r10, r0
LoopDepth8:
cmp r2, #8
blt End
vld1.32 {q0}, [r1]!
vld1.32 {q13}, [r5]!
vshl.s32 q0, q0, q13
vld1.32 {q14}, [r4]!
vqrdmulh.s32 q0, q0, q14
vld1.32 {q12}, [r6]!
vand q4, q0, q12
vshr.s32 q4, q4, #31
vqadd.s32 q0, q0, q4
vrshl.s32 q0, q0, q12
vadd.i32 q0, q0, q15
vmax.s32 q0, q0, q11
vmin.s32 q0, q0, q10
vqmovn.s32 d4, q0
vld1.32 {q1}, [r1]!
vld1.32 {q13}, [r5]!
vshl.s32 q1, q1, q13
vld1.32 {q14}, [r4]!
vqrdmulh.s32 q1, q1, q14
vld1.32 {q12}, [r6]!
vand q4, q1, q12
vshr.s32 q4, q4, #31
vqadd.s32 q1, q1, q4
vrshl.s32 q1, q1, q12
vadd.i32 q1, q1, q15
vmax.s32 q1, q1, q11
vmin.s32 q1, q1, q10
vqmovn.s32 d5, q1
vqmovn.s16 d4, q2
vst1.8 {d4}, [r10]!
sub r2, r2, #8
b LoopDepth8
LoopDepth4:
cmp r2, #4
blt End
vld1.32 {q0}, [r1]!
vld1.32 {q13}, [r5]!
vshl.s32 q0, q0, q13
vld1.32 {q14}, [r4]!
vqrdmulh.s32 q0, q0, q14
vld1.32 {q12}, [r6]!
vand q4, q0, q12
vshr.s32 q4, q4, #31
vqadd.s32 q0, q0, q4
vrshl.s32 q0, q0, q12
vadd.i32 q0, q0, q15
vmax.s32 q0, q0, q11
vmin.s32 q0, q0, q10
vqmovn.s32 d0, q0
vqmovn.s16 d0, q0
vst1.8 {d0[0]}, [r10]!
vst1.8 {d0[1]}, [r10]!
vst1.8 {d0[2]}, [r10]!
vst1.8 {d0[3]}, [r10]!
sub r2, r2, #4
b LoopDepth4
End:
sub sp, sp, #88
vpop {q4-q7}
pop {r4-r8, r10}
bx lr
#endif
#endif

View File

@ -18,7 +18,7 @@
#include <string.h>
#include "nnacl/errorcode.h"
int ExpandDims(float *input_ptr, float *output_ptr, size_t data_size) {
int ExpandDims(void *input_ptr, void *output_ptr, size_t data_size) {
memcpy(output_ptr, input_ptr, data_size);
return NNACL_OK;
}

View File

@ -27,7 +27,7 @@ typedef struct ExpandDimsParameter {
#ifdef __cplusplus
extern "C" {
#endif
int ExpandDims(float *input_ptr, float *output_ptr, size_t data_size);
int ExpandDims(void *input_ptr, void *output_ptr, size_t data_size);
#ifdef __cplusplus
}
#endif

View File

@ -1,26 +0,0 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
// * http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "nnacl/fp32/leaky_relu.h"
void DoLeakyRelu(float *input, float *output, LeakyReluParameter *param, int task_id) {
for (int i = task_id; i < param->input_num_; i += param->op_parameter_.thread_num_) {
if (input[i] <= 0) {
output[i] = input[i] * param->slope_[0];
} else {
output[i] = input[i];
}
}
}

View File

@ -1,30 +0,0 @@
/**
* Copyright 2020 Huawei Technologies Co., Ltd
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MINDSPORE_LITE_NNACL_PRELU_H_
#define MINDSPORE_LITE_NNACL_PRELU_H_
#include "nnacl/op_base.h"
#include "nnacl/leaky_relu_parameter.h"
#ifdef __cplusplus
extern "C" {
#endif
void DoLeakyRelu(float *input, float *output, LeakyReluParameter *prelu_param_, int task_id);
#ifdef __cplusplus
}
#endif
#endif // MINDSPORE_LITE_NNACL_PRELU_H_

View File

@ -19,7 +19,7 @@
#include "nnacl/errorcode.h"
int AvgPooling(const float *input_ptr, float *output_ptr, PoolingParameter *pooling_param, int task_id, float minf,
float maxf) {
float maxf) {
int stride_w = pooling_param->stride_w_;
int stride_h = pooling_param->stride_h_;
int pad_w = pooling_param->pad_l_;
@ -137,7 +137,7 @@ int AvgPooling(const float *input_ptr, float *output_ptr, PoolingParameter *pool
real_count = window;
}
if (real_count == 0) {
return NNACL_ERR;
return NNACL_ERR;
}
tmp_avg = tmp_avg / (float)real_count;
tmp_avg = fmax(tmp_avg, minf);

View File

@ -28,7 +28,7 @@
extern "C" {
#endif
int AvgPooling(const float *input_ptr, float *output_ptr, PoolingParameter *pooling_param, int task_id, float minf,
float maxf);
float maxf);
void MaxPooling(const float *input_ptr, float *output_ptr, PoolingParameter *pooling_param, int task_id, float minf,
float maxf);
#ifdef __cplusplus

View File

@ -35,6 +35,9 @@ void PostFuncInt8C4(const int32_t *in, const int32_t *bias, int8_t *out, size_t
#ifdef ENABLE_ARM
void ConvDwInt8Row(int32_t *output_ptr, const int8_t *input_ptr, const int16_t *weight_ptr, int num_pixels,
int output_channel, int input_step, int8_t input_zp);
void ConvDwInt8PostAlign4PerChannel(int8_t *dst, int32_t *buffer, int channel4, int32_t output_zp,
int32_t *out_multiplier, int32_t *left_shift, int32_t *right_shift, int32_t acc_min,
int32_t acc_max);
void ConvDwInt8PostAlign4(int8_t *dst, int32_t *buffer, int num_pixels, int32_t output_zp, int32_t out_multiplier,
int32_t left_shift, int32_t right_shift, int32_t acc_min, int32_t acc_max);
void IndirectGemmInt16to32_8x4(int32_t *dst, const int16_t *src, const int16_t *weight, size_t ksize, size_t ic8,
@ -64,9 +67,6 @@ void IndirectGemmInt8_4x4(int8_t *output, const int8_t *input, const int8_t *wei
void DeconvDwInt8Center(int32_t *dst, const int16_t *src, const int16_t *weight, size_t height, size_t width,
size_t kernel_h, size_t kernel_w, size_t out_h_step, size_t block_channel, size_t in_sh_step,
size_t in_sw_step, size_t in_kh_step, size_t in_kw_step);
void ConvDwInt8PostAlign4PerChannel(int8_t *dst, int32_t *buffer, int channel4, int32_t output_zp,
int32_t *out_multiplier, int32_t *left_shift, int32_t *right_shift, int32_t acc_min,
int32_t acc_max);
#endif
#ifdef __cplusplus

View File

@ -39,7 +39,7 @@ void ConvDwInt8Post(int8_t *dst, int32_t *buffer, int output_w, int channel, int
// support perchannel
for (int w = 0; w < output_w; w++) {
int channel4 = 0;
#ifdef ENABLE_ARM64
#ifdef ENABLE_ARM
channel4 = channel / 4 * 4;
ConvDwInt8PostAlign4PerChannel(dst, buffer, channel4, output_zp, out_multiplier, left_shift, right_shift, acc_min,
acc_max);

View File

@ -17,8 +17,7 @@
#include "nnacl/int8/leaky_relu_int8.h"
#include "nnacl/errorcode.h"
int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_prelu_parm, QuantArg *input_quant,
int task_id) {
int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_prelu_parm, int task_id) {
if (quant_prelu_parm == NULL) {
return NNACL_NULL_PTR;
}
@ -27,17 +26,12 @@ int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_
const float output_inverse_scale = 1.f / output_scale;
int output_dim = quant_prelu_parm->input_dim_;
float scale = quant_prelu_parm->quant_arg.in_args_.scale_ * output_inverse_scale;
float bias = -quant_prelu_parm->quant_arg.in_args_.zp_ * scale;
for (int i = 0; i < output_dim; i++) {
input_quant[i].scale_ = quant_prelu_parm->quant_arg.in_args_.scale_;
input_quant[i].zp_ = quant_prelu_parm->quant_arg.in_args_.zp_;
}
for (int i = 0; i < output_dim; i++) {
float scale = input_quant[i].scale_ * output_inverse_scale;
float bias = -input_quant[i].zp_ * scale;
for (int j = task_id; j < quant_prelu_parm->element_num; j += quant_prelu_parm->op_parameter_.thread_num_) {
if (inputs[j] <= 0) {
int32_t output_tmp = round(inputs[j] * quant_prelu_parm->slope_[0] * scale + bias) + output_zp;
int32_t output_tmp = round(inputs[j] * quant_prelu_parm->slope_ * scale + bias) + output_zp;
if (output_tmp > 127) {
output_ptr[j] = 127;
} else if (output_tmp < -128) {
@ -57,6 +51,5 @@ int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_
}
}
}
free(input_quant);
return NNACL_OK;
}

View File

@ -23,8 +23,7 @@
#ifdef __cplusplus
extern "C" {
#endif
int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_Prelu_parm, QuantArg *input_quant,
int task_id);
int DoLeakReluInt8(int8_t *inputs, int8_t *output_ptr, LeakyReluQuantArg *quant_Prelu_parm, int task_id);
#ifdef __cplusplus
}
#endif

Some files were not shown because too many files have changed in this diff Show More