!15320 fix naml compile error

From: @yuzhenhua666 Reviewed-by: @wuxuejian,@oacjiewen Signed-off-by: @wuxuejian
!15309 modify topk adapter for r1.2
2021-04-17 14:16:16 +08:00 · 2021-04-17 12:55:27 +08:00 · 2021-04-17 11:40:00 +08:00 · 2021-04-17 10:40:34 +08:00 · 2021-04-17 10:18:09 +08:00 · 2021-04-17 09:52:00 +08:00
2138 changed files with 46489 additions and 29765 deletions
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -9,7 +9,9 @@ include(${CMAKE_SOURCE_DIR}/cmake/options.cmake)
 include(${CMAKE_SOURCE_DIR}/cmake/check_requirements.cmake)
 set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/modules/")
 if(NOT CMAKE_SYSTEM_NAME MATCHES "Windows")
-    add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
+    if(NOT ENABLE_GLIBCXX)
+        add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
+    endif()
 endif()

 if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
@ -49,7 +51,7 @@ include_directories(${CMAKE_CURRENT_SOURCE_DIR}/third_party/flatbuffers/include)
 include_directories(${CMAKE_CURRENT_SOURCE_DIR}/third_party/flatbuffers/include/flatbuffers)

 include(${CMAKE_SOURCE_DIR}/cmake/dependency_utils.cmake)
-find_package(Python3 3.7 COMPONENTS Interpreter Development)
+find_package(Python3 COMPONENTS Interpreter Development)
 if(Python3_FOUND)
    set(PYTHON_INCLUDE_DIRS "${Python3_INCLUDE_DIRS}")
    set(PYTHON_LIBRARIES "${Python3_LIBRARIES}")
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -30,23 +30,24 @@ For individual contributor, please refer to [ICLA online document](https://www.m

 Please follow this style to make MindSpore easy to review, maintain and develop.

-* Coding guidelines
+- Coding guidelines

    The *Python* coding style suggested by [Python PEP 8 Coding Style](https://pep8.org/) and *C++* coding style suggested by [Google C++ Coding Guidelines](http://google.github.io/styleguide/cppguide.html) are used in MindSpore community.

-* Unittest guidelines
+- Unittest guidelines

    The *Python* unittest style suggested by [pytest](http://www.pytest.org/en/latest/) and *C++* unittest style suggested by [Googletest Primer](https://github.com/google/googletest/blob/master/docs/primer.md) are used in MindSpore community.

 ### Fork-Pull development model

-* Fork MindSpore repository
+- Fork MindSpore repository

    Before submitting code to MindSpore project, please make sure that this project have been forked to your own repository. It means that there will be parallel development between MindSpore repository and your own repository, so be careful to avoid the inconsistency between them.

-* Clone the remote repository
+- Clone the remote repository

    If you want to download the code to the local machine, `git` is the best way:
+
    ```shell
    # For GitHub
    git clone https://github.com/{insert_your_forked_repo}/mindspore.git
@ -56,18 +57,20 @@ Please follow this style to make MindSpore easy to review, maintain and develop.
    git remote add upstream https://gitee.com/mindspore/mindspore.git
    ```

-* Develop code locally
+- Develop code locally

    To avoid inconsistency between multiple branches, checking out to a new branch is `SUGGESTED`:
+
    ```shell
    git checkout -b {new_branch_name} origin/master
    ```

    Then you can change the code arbitrarily.

-* Push the code to the remote repository
+- Push the code to the remote repository

    After updating the code, you should push the update in the formal way:
+
    ```shell
    git add .
    git status # Check the update status
@ -76,7 +79,7 @@ Please follow this style to make MindSpore easy to review, maintain and develop.
    git push origin {new_branch_name}
    ```

-* Pull a request to MindSpore repository
+- Pull a request to MindSpore repository

    In the last step, your need to pull a compare request between your new branch and MindSpore `master` branch. After finishing the pull request, the Jenkins CI will be automatically set up for building test.

@ -101,11 +104,11 @@ When reporting issues, refer to this format:

 ### Propose PRs

-* Raise your idea as an *issue* on [GitHub](https://github.com/mindspore-ai/mindspore/issues) or [Gitee](https://gitee.com/mindspore/mindspore/issues)
-* If it is a new feature that needs lots of design details, a design proposal should also be submitted.
-* After reaching consensus in the issue discussions and design proposal reviews, complete the development on the forked repo and submit a PR.
-* None of PRs is not permitted until it receives **2+ LGTM** from approvers. Please NOTICE that approver is NOT allowed to add *LGTM* on his own PR.
-* After PR is sufficiently discussed, it will get merged, abandoned or rejected depending on the outcome of the discussion.
+- Raise your idea as an *issue* on [GitHub](https://github.com/mindspore-ai/mindspore/issues) or [Gitee](https://gitee.com/mindspore/mindspore/issues)
+- If it is a new feature that needs lots of design details, a design proposal should also be submitted.
+- After reaching consensus in the issue discussions and design proposal reviews, complete the development on the forked repo and submit a PR.
+- None of PRs is not permitted until it receives **2+ LGTM** from approvers. Please NOTICE that approver is NOT allowed to add *LGTM* on his own PR.
+- After PR is sufficiently discussed, it will get merged, abandoned or rejected depending on the outcome of the discussion.

 **PRs advisory:**

--- a/README.md
+++ b/README.md
@ -85,7 +85,7 @@ For installation using `pip`, take `CPU` and `Ubuntu-x86` build version as an ex
 1. Download whl from [MindSpore download page](https://www.mindspore.cn/versions/en), and install the package.

    ```bash
-    pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.1.0/MindSpore/cpu/ubuntu_x86/mindspore-1.1.0-cp37-cp37m-linux_x86_64.whl
+    pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.2.0-rc1/MindSpore/cpu/ubuntu_x86/mindspore-1.2.0rc1-cp37-cp37m-linux_x86_64.whl
    ```

 2. Run the following command to verify the install.
--- a/README_CN.md
+++ b/README_CN.md
@ -82,7 +82,7 @@ MindSpore提供跨多个后端的构建选项：
 1. 请从[MindSpore下载页面](https://www.mindspore.cn/versions)下载并安装whl包。

    ```bash
-    pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.1.0/MindSpore/cpu/ubuntu_x86/mindspore-1.1.0-cp37-cp37m-linux_x86_64.whl
+    pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.2.0-rc1/MindSpore/cpu/ubuntu_x86/mindspore-1.2.0rc1-cp37-cp37m-linux_x86_64.whl
    ```

 2. 执行以下命令，验证安装结果。
--- a/RELEASE.md
+++ b/RELEASE.md
@ -1,6 +1,68 @@
-# MindSpore 1.2.0 Release Notes
+# MindSpore 1.2.0

-## MindSpore
+## MindSpore 1.2.0 Release Notes
+
+### Major Features and Improvements
+
+#### NewModels
+
+- [STABLE] Add CV models on Ascend: 3D Unet, Unet++, SSD-Resnet50-fpn, SSD-VGG16, crnn_seq2seq_ocr for BSI, CTPN, resnet18, DPN
+- [STABLE] Add CV models on GPU: Faster-RCNN
+- [STABLE] Add NLP models on Ascend: NAML, Fasttext, GRU, LSTM
+- [BETA] Add TPRR: Thinking Path Re-Ranker, an original ranked-base framework for Multi-Hop Question Answering which has won the first place in HotpotQA leaderboard.(Ascend)
+
+#### FrontEnd
+
+- [STABLE] Support side effects expression to ensure that the perform order of user's semantics is correct.(Ascend/GPU/CPU)
+- [STABLE] Support calculating the gradient for network that contain non-Tensor input parameters（int, float, bool, mstype,int, mstype.float, mstype.uint, mstype.bool_, tuple, list, dict）.(Ascend/GPU/CPU)
+- [STABLE] Support the inverse of a bool Tensor.(Ascend/GPU/CPU)
+- [STABLE] Uniform the interface `isinstance`.(Ascend/GPU/CPU)
+- [STABLE] Support negative indexes.(Ascend/GPU/CPU)
+- [STABLE] Support 110+ Numpy-like interfaces in mindspore.numpy.(Ascend/GPU/CPU)
+- [STABLE] Support export/load mindir model with a size greater than 2 GB.
+- [STABLE] The optimizer supports gradient centralization.(Ascend)
+- [STABLE] Support support auc metric, rou metric, bleu score metric, confusion matrix metric, cosine similarity metric, dice metric, hausdorff distance metric, occlusion sensitivity metric, perplexity metric, mean surface distance metric, root mean surface distance metric.
+- [STABLE] Support use EmbeddingLookup with cache.(Ascend)
+
+#### Auto Parallel
+
+- [STABLE] Support AllGather and ReduceScatter fusion.(Ascend)
+- [STABLE] Support gradient accumulation feature in auto parallel mode.(Ascend/GPU)
+- [STABLE] Support running parallel optimizer with gradient accumulation.(Ascend)
+- [STABLE] Add the configuration of communication operators' fusion.(Ascend)
+
+#### Executor
+
+- [STABLE] Support inference with Nvidia GPU.
+- [STABLE] Support data parallelism in PyNative mode.(Ascend/GPU)
+- [STABLE] Optimize LSTM inference memory consumption in Graph mode with CPU.
+
+#### Sponge
+
+- [STABLE] Add SPONGE modules for molecular dynamics simulation, including Bond, Angle, Dihedral, Non Bond 14, NeighborList, Particle Mesh Ewald, Langevin MD and LIUJIAN MD.(GPU)
+
+#### DataSet
+
+- [STABLE] If the libnuma library is installed in the environment, you can run `export DATASET_ENABLE_NUMA=True` to configure NUMA binding. In multi-card training scenarios, the training data processing speed can be improved, thereby improving the network training efficiency.
+- [STABLE] Unify API Tensor structure of Training/Inference interfaces in C++ SDK.
+- [STABLE] Optimize duplicated Decode in data preprocess using cache, improve preprocess efficiency.
+- [STABLE] Support eager mode to run data augmentation in Python & C++.
+- [STABLE] Support more data augmentation operators(e.g. Affine, Perspective) in MindSpore-Lite.
+- [STABLE] Support light pipeline to process MindData in MindSpore-Lite training.
+- [STABLE] Support more data preprossing operators based on DVPP hardware module and can be used on on Ascend310 platform.
+- [STABLE] Support copy-free property for data in Ascend310 inference process scenarios.
+
+#### Running Data Recorder
+
+- [STABLE] Support running data recorder (RDR)  for exception demarcation.
+- [STABLE] Provide records of multi-stage computational graphs, memory allocation information, graph execution order, stream execution order and task debug information when a "run task error" or "distribute task failed" occurs. (Ascend)
+- [STABLE] Provide records of multi-stage computational graphs, memory allocation information and graph execution order when a "SyncStream error" occurs. (GPU)
+
+#### 3D Feature
+
+- [STABLE] Support 3D ops: Conv3D, Conv3DBackpropInput, Conv3DBackpropFilter, Conv3DTranspose, BiasAdd, BiasAddGrad, PReLU, Transpose, Reshape, transdata, StrideSlice, MaxPool3D, MaxPool3DGrad, BinaryCrossEntropy, SigmoidCrossEntropyWithLogits, SigmoidCrossEntropyWithLogitsGrad, SoftmaxCrossEntropyWithLogits, SigmoidCrossEntropyWithLogits, SigmoidCrossEntropyWithLogitsGrad, BatchNorm3d, BatchNorm3dGrad, Dropout3d.
+- [STABLE] Support RMSELoss loss function, MAELoss loss function, FocalLoss loss function, DiceLoss binary loss function, and MultiClassDiceLoss multi-type loss function for 2D/3D network.
+- [STABLE] Add optimizer: AdamApplyOne(3D), ApplyMomentum(3D), SGD(3D).

 ### API Change

@ -8,6 +70,79 @@

 ##### Python API

+###### `mindspore.numpy.array()`, `mindspore.numpy.asarray()`, `mindspore.numpy.asfarray()`, `mindspore.numpy.copy()` now support GRAPH mode, but cannot accept `numpy.ndarray` as input arguments anymore([!12726](https://gitee.com/mindspore/mindspore/pulls/12726))
+
+Previously, these interfaces can accept numpy.ndarray as arguments and convert numpy.ndarray to Tensor, but cannot be used in GRAPH mode.
+However, currently MindSpore Parser cannot parse numpy.ndarray in JIT-graph. To support these interfaces in graph mode, we have to remove `numpy.ndarray` support. With that being said, users can still use `Tensor` to convert `numpy.ndarray` to tensors.
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```python
+>>> import mindspore.numpy as mnp
+>>> import numpy
+>>>
+>>> nd_array = numpy.array([1,2,3])
+>>> tensor = mnp.asarray(nd_array) # this line cannot be parsed in GRAPH mode
+```
+
+</td>
+<td>
+
+```python
+>>> import mindspore.numpy as mnp
+>>> import numpy
+>>>
+>>> tensor = mnp.asarray([1,2,3]) # this line can be parsed in GRAPH mode
+```
+
+</td>
+</tr>
+</table>
+
+###### mindspore.numpy interfaces remove support for keyword arguments `out` and `where`([!12726](https://gitee.com/mindspore/mindspore/pulls/12726))
+
+Previously, we have incomplete support for keyword arguments `out` and `where` in mindspore.numpy interfaces, however, the `out` argument is only functional when `where` argument is also provided, and `out` cannot be used to pass reference to numpy functions. Therefore, we have removed these two arguments to avoid any confusion users may have. Their original functionality can be found in [np.where](https://www.mindspore.cn/doc/api_python/zh-CN/master/mindspore/numpy/mindspore.numpy.where.html#mindspore.numpy.where)
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```python
+>>> import mindspore.numpy as np
+>>>
+>>> a = np.ones((3,3))
+>>> b = np.ones((3,3))
+>>> out = np.zeros((3,3))
+>>> where = np.asarray([[True, False, True],[False, False, True],[True, True, True]])
+>>> res = np.add(a, b, out=out, where=where) # `out` cannot be used as a reference, therefore it is misleading
+```
+
+</td>
+<td>
+
+```python
+>>> import mindspore.numpy as np
+>>>
+>>> a = np.ones((3,3))
+>>> b = np.ones((3,3))
+>>> out = np.zeros((3,3))
+>>> where = np.asarray([[True, False, True],[False, False, True],[True, True, True]])
+>>> res = np.add(a, b)
+>>> out = np.where(where, x=res, y=out) # instead of np.add(a, b, out=out, where=where)
+```
+
+</td>
+</tr>
+</table>
+
 ###### Turn `ops.MakeRefKey` into an internal interface ([!12010](https://gitee.com/mindspore/mindspore/pulls/12010))

 Previously MakeRefKey is an external interface that is not used, now make it an internal interface with the same usage. We do not recommend users to use this interface, and we will remove the relevant introduction of this interface from the official website.
@ -16,6 +151,534 @@ Previously MakeRefKey is an external interface that is not used, now make it an

 Previously the number of outputs of these operator is different on different backends. To unify their definition we change their output on Ascend backend from multiple to a single.

+##### `P.FusedBatchNorm`, `P.FusedBatchNormEx` deleted ([!12115](https://gitee.com/mindspore/mindspore/pulls/12115))
+
+The FusedBatchNorm and FusedBatchNormEx interface has been deleted. Please use the batchnorm operator to replace it.
+
+##### `MetaTensor` deleted ([!10325](https://gitee.com/mindspore/mindspore/pulls/10325))
+
+The MetaTensor interface has been deleted. The function of MetaTensor has been integrated into tensor.
+
+###### `ControlDepend` is deleted, use `Depend` instead. The decorator `@C.add_flags(has_effect=True)` does not work. ([!13793](https://gitee.com/mindspore/mindspore/pulls/13793))
+
+Previously, we used ControlDepend to control the execution order of multiple operators. In version 1.2.0, mindspore introduces the auto-monad side effects expression to ensure that the perform order of user's semantics is correct. Therefore, ControlDepend is deleted and Depend is recommended.
+
+In most scenarios, if operators have IO side effects (such as print) or memory side effects (such as assign), they will be executed according to the user's semantics. In some scenarios, if the two operators A and B have no order dependency, and A must be executed before B, we recommend using Depend to specify their execution order. See the API documentation of the Depend operator for specific usage.
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```python
+    In some side-effect scenarios, we need to ensure the execution order of operators.
+    In order to ensure that operator A is executed before operator B, it is recommended
+    to insert the Depend operator between operators A and B.
+
+    Previously, the ControlDepend operator was used to control the execution order.
+    Since the ControlDepend operator is deprecated from version 1.1, it is recommended
+    to use the Depend operator instead. The replacement method is as follows::
+
+        a = A(x)                --->        a = A(x)
+        b = B(y)                --->        y = Depend(y, a)
+        ControlDepend(a, b)     --->        b = B(y)
+```
+
+</td>
+<td>
+
+```python
+    In most scenarios, if operators have IO side effects or memory side effects,
+    they will be executed according to the user's semantics. In some scenarios,
+    if the two operators A and B have no order dependency, and A must be executed
+    before B, we recommend using Depend to specify their execution order. The
+    usage method is as follows::
+
+        a = A(x)                --->        a = A(x)
+        b = B(y)                --->        y = Depend(y, a)
+                                --->        b = B(y)
+```
+
+</td>
+</tr>
+</table>
+
+After the introduction of the auto-monad side effect expression feature, the decorator `@C.add_flags(has_effect=True)` does not work. If the decorator is used in the script, please modify. Take the overflow identification operator (without side effects) as an example, the modification method is as follows:
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```python
+@C.add_flags(has_effect=True)
+def construct(self, *inputs):
+    ...
+    loss = self.network(*inputs)
+    init = self.allo_status()
+    self.clear_status(init)
+    ...
+```
+
+</td>
+<td>
+
+```python
+def construct(self, *inputs):
+    ...
+    loss = self.network(*inputs)
+    init = self.allo_status()
+    init = F.depend(init, loss)
+    clear_status = self.clear_status(init)
+    ...
+```
+
+</td>
+</tr>
+</table>
+
+##### C++ API
+
+###### C++ API support dual ABI now.([!12432](https://gitee.com/mindspore/mindspore/pulls/12432))
+
+1.1.1 supports only the old ABI. Currently, both the new and the old are supported.
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```cmake
+add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)
+```
+
+</td>
+<td>
+
+```cmake
+add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)  # old ABI are supported
+add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=1)  # new ABI are supprrted, too
+                                                   # write nothing, use new ABI as default
+```
+
+</td>
+</tr>
+</table>
+
+###### Context refactor.([!13515](https://gitee.com/mindspore/mindspore/pulls/13515))
+
+The `Context` class is refactored. For details, see the API docs.
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+GlobalContext::SetGlobalDeviceTarget(kDeviceTypeAscend310);       // set device target is ascend310
+GlobalContext::SetGlobalDeviceID(0);                              // set device id is 0
+auto model_context = std::make_shared<ModelContext>();            // create a model context
+ModelContext::SetInsertOpConfigPath(model_context, "./aipp.cfg")  // set aipp config file is ./aipp.cfg
+```
+
+</td>
+<td>
+
+```cpp
+auto model_context = std::make_shared<Context>();                 // create a model context
+auto ascend310_info = std::make_shared<Ascend310DeviceInfo>();
+model_context.MutableDeviceInfo().push_back(ascend310_info );     // set device target is ascend310
+ascend310_info->SetDeviceID(0);                                   // set device id is 0
+ascend310_info->SetInsertOpConfigPath("./aipp.cfg");              // set aipp config file is ./aipp.cfg
+```
+
+</td>
+</tr>
+</table>
+
+###### LoadModel interface changes.([!13515](https://gitee.com/mindspore/mindspore/pulls/13515))
+
+`LoadModel` is renamed `Load`. No exception is thrown new but the return status should be checked.
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+try {
+  auto graph = Serialization::LoadModel(model_file_path, kMindIR);
+} catch (...) { ... }
+```
+
+</td>
+<td>
+
+```cpp
+Graph graph;
+auto ret = Serialization::Load(model_file_path, kMindIR, &graph);
+if (ret != kSuccess) { ... }
+```
+
+</td>
+</tr>
+</table>
+
+###### Model ctor changes.([!13515](https://gitee.com/mindspore/mindspore/pulls/13515))
+
+`Model` uses a non-parameter ctor now, and arguments are passed in through `Build`.
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+Model net(net_cell, model_context);
+auto ret = net.Build();
+if (ret != kSuccess) { ... }
+```
+
+</td>
+<td>
+
+```cpp
+Model net;
+auto ret = net.Build(net_cell, model_context);
+if (ret != kSuccess) { ... }
+```
+
+</td>
+</tr>
+</table>
+
+###### MSTensor::CreateTensor returns a native pointer now.([!13515](https://gitee.com/mindspore/mindspore/pulls/13515))
+
+`MSTensor::CreateTensor` and `MSTensor::CreateRefTensor` returns a native pointer now, need to be destroy by `DestroyTensorPtr`.
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+auto tensor = MSTensor::CreateTensor(xxx, xxx, ...);
+auto name = tensor.Name();
+```
+
+</td>
+<td>
+
+```cpp
+auto tensor = MSTensor::CreateTensor(xxx, xxx, ...);
+auto name = tensor->Name();
+MSTensor::DestroyTensorPtr(tensor);
+```
+
+</td>
+</tr>
+</table>
+
+#### New features
+
+##### Python API
+
+- Add SPONGE functions: `mindspore.ops.operations.BondForceWithAtomEnergy`, `mindspore.ops.operations.AngleForceWithAtomEnergy`, `mindspore.ops.operations.DihedralForceWithAtomEnergy`, `mindspore.ops.operations.Dihedral14LJCFForceWithAtomEnergy`, `mindspore.ops.operations.LJForceWithPMEDirectForce`, `mindspore.ops.operations.PMEExcludedForce`, `mindspore.ops.operations.PMEReciprocalForce`,`mindspore.ops.operations.BondEnergy`, `mindspore.ops.operations.AngleEnergy`,`mindspore.ops.operations.DihedralEnergy`, `mindspore.ops.operations.Dihedral14LJEnergy`, `mindspore.ops.operations.Dihedral14CFEnergy`,`mindspore.ops.operations.LJEnergy`, `mindspore.ops.operations.PMEEnergy`. All operators are supported in `GPU`.
+
+#### Deprecations
+
+##### Python API
+
+###### `nn.MatMul` is now deprecated in favor of `ops.matmul` ([!12817](https://gitee.com/mindspore/mindspore/pulls/12817))
+
+[ops.matmul](https://www.mindspore.cn/doc/api_python/zh-CN/master/mindspore/ops/mindspore.ops.matmul.html#mindspore.ops.matmul) follows the API of [numpy.matmul](https://numpy.org/doc/stable/reference/generated/numpy.matmul.html) as closely as possible. As a function interface, [ops.matmul](https://www.mindspore.cn/doc/api_python/zh-CN/master/mindspore/ops/mindspore.ops.matmul.html#mindspore.ops.matmul) is applied without instantiation, as opposed to `nn.MatMul`, which should only be used as a class instance.
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.1 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```python
+>>> import numpy as np
+>>> from mindspore import Tensor, nn
+>>>
+>>> x = Tensor(np.ones((2, 3)).astype(onp.float32)
+>>> y = Tensor(np.ones((3, 4)).astype(onp.float32)
+>>> nn.MatMul()(x, y)
+```
+
+</td>
+<td>
+
+```python
+>>> import numpy as np
+>>> from mindspore import Tensor, ops
+>>>
+>>> x = Tensor(np.ones((2, 3)).astype(onp.float32)
+>>> y = Tensor(np.ones((3, 4)).astype(onp.float32)
+>>> ops.matmul(x, y)
+```
+
+</td>
+</tr>
+</table>
+
+### Bug fixes
+
+#### FrontEnd
+
+- fix the null pointer problem of evaluator in control flow.([!13312](https://gitee.com/mindspore/mindspore/pulls/13312))
+- fix parameter naming conflict bug for CellList and SequentialCell. ([!13260](https://gitee.com/mindspore/mindspore/pulls/13260))
+
+#### Executor
+
+- fix executor pending task not execute in some heterogeneous cases.([!13465](https://gitee.com/mindspore/mindspore/pulls/13465))
+- add passes to support frontend IR unification, including following operations: SliceGrad([!11783](https://gitee.com/mindspore/mindspore/pulls/11783)), ApplyFtrl, ApplyMomentum, ApplyRMSProp, CenteredRMSProp([!11895](https://gitee.com/mindspore/mindspore/pulls/11895)), AvgPoolGrad([!12813](https://gitee.com/mindspore/mindspore/pulls/12813)), BatchNorm([!12115](https://gitee.com/mindspore/mindspore/pulls/12115))
+
+#### Dataset
+
+- Fix getter functions(e.g. GetDatasetSize) terminated abnormally when use python multi-processing. ([!13571](https://gitee.com/mindspore/mindspore/pulls/13571), [!13823](https://gitee.com/mindspore/mindspore/pulls/13823))
+- Fix unclear error log of data augmentation operators. ([!12398](https://gitee.com/mindspore/mindspore/pulls/12398), [!12883](https://gitee.com/mindspore/mindspore/pulls/12883), [!13176](https://gitee.com/mindspore/mindspore/pulls/13176))
+- Fix profiling performs abnormally when sink_size = False, as saving data is later than profiling analysis. ([!13944](https://gitee.com/mindspore/mindspore/pulls/13944))
+
+## MindSpore Lite
+
+### Major Features and Improvements
+
+#### Converter and runtime
+
+1. Support TensorFlow model in Converter except aware-training model.
+2. Add fusion pattern for same horizontal operators in Converter.
+3. Support Jar in x86_64 system for integrating into server with Java backend conveniently.
+4. Provide unified runtime API for developer reusing their code between cloud side and end side.[BETA]
+5. Improve control-flow capabilities continually: Support GRU fusion in Converter; Support weight-quant for control-flow model; Support control-flow model inference with half precision; Support nested control-flow model.[BETA]
+
+#### ARM backend optimization
+
+1. Add NLP dependent float16 operators(like lstm) to enhance inference performance.
+2. Optimize operators: lstm, gru, depthwise.
+3. Add 6 NPU operators(like FullConnection), and fix some bugs about buildIR failed.
+
+#### OpenCL backend
+
+1. Add new ops：add 10+ ops，total 72 ops；
+2. Performance optimization：by memory layout optimize，block tiling，Performance improved by 30% compared to version 1.1 at Adreno GPU.
+3. Initialization time optimization：initialization time improve 100% vs MSLITE Version1.1 by store kernel cache as binary.
+4. Support Java call on Mali or Adreno GPU.
+
+#### Post quantization
+
+1. Support quantization of gather and lstm ops.
+2. Support quantizatizing TF Lite models with sub-graph node.
+3. Add quantiztion strategy to decide quantize ops or not，less accuracy loss and higher compression rate.
+
+#### Training on Device
+
+1. Virtual batching, use mini-batch to minic large batch in theorical with few RAM consumption.
+2. Converter unify, do not compile tod and iod converter separately.
+3. Performance optimization to BWD ops.
+4. TrainLoop with Off-The-Shelf Functionality blocks, like LR scheduler, Loss Monitor, Ckpt Saver, Accuracy Monitor.
+5. Integration of code with Minddata lite.
+6. Support more networks (googlenet, densenet, shufflenetv2, nin, vgg) and operators.
+
+#### Codegen
+
+1. Support 79 ops for the ARM platform and all CMSIS ops for Arm Cortex-M Series.
+2. Multiplatform support, including Android, IoT Devices.
+3. Support offline model weight preprocessing while compiling.
+4. Support offline memory reuse computing for minimum runtime buffer size.
+
+### API Change
+
+#### API Incompatible Change
+
+##### C++ API
+
+###### Add header file named lite_types.h for some common data structs. ([!12262](https://gitee.com/mindspore/mindspore/pulls/12262))
+
+Previously, some common data structs such as `CpuBindMode` and `DeviceType` are in context.h, this may cause cross-dependency between headers. So we create a new header named lite_types.h for some common data structs and move `CpuBindMode` and `DeviceType` from context.h into lite_types.h.
+
+<table>
+<tr>
+<td style="text-align:center"> lite_types.h </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+namespace mindspore::lite {
+/// \brief CpuBindMode defined for holding bind cpu strategy argument.
+typedef enum {
+  NO_BIND,    /**< no bind */
+  HIGHER_CPU, /**< bind higher cpu first */
+  MID_CPU     /**< bind middle cpu first */
+} CpuBindMode;
+
+/// \brief DeviceType defined for holding user's preferred backend.
+typedef enum {
+  DT_CPU, /**< CPU device type */
+  DT_GPU, /**< GPU device type */
+  DT_NPU  /**< NPU device type */
+} DeviceType;
+}  // namespace mindspore::lite
+```
+
+</td>
+</tr>
+</table>
+
+###### Add some new interfaces in ms_tensor.h for unified runtime API.([!13515](https://gitee.com/mindspore/mindspore/pulls/13515))
+
+Previously, users could not create `MSTensor` or modify ``MSTensor, all `MSTensor` are created and managed by framework. However users need to create or modify MSTensor sometimes such as pre-processing input data. So we provide two new interfaces in ms_tensor.h: `CreateTensor` interface for creating `MSTensor` by user and `set_shape` interface for modifying the shape of `MSTensor`.
+
+<table>
+<tr>
+<td style="text-align:center"> CreateTensor </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+/// \brief Create a MSTensor.
+///
+/// \return Pointer to an instance of MindSpore Lite MSTensor.
+static MSTensor *CreateTensor(const std::string &name, TypeId type, const std::vector<int> &shape, const void *data,
+                                size_t data_len);
+```
+
+</td>
+</tr>
+</table>
+
+<table>
+<tr>
+<td style="text-align:center"> set_shape </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+/// \brief Set the shape of MSTensor.
+virtual void set_shape(const std::vector<int> &shape) = 0;
+```
+
+</td>
+</tr>
+</table>
+
+Previously, users could access to data of `MSTensor` by interface named `MutableData`. However `MutableData` is not only returning data of tensor but also allocating data for tensor if its data is nullptr. So we provide a new interfaces in ms_tensor.h named `data` for returning data of tensor without allocating automatically.
+
+<table>
+<tr>
+<td style="text-align:center"> data </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+/// \brief Get the pointer of data in MSTensor.
+///
+/// \note The data pointer can be used to both write and read data in MSTensor. No memory buffer will be
+/// allocated.
+///
+/// \return the pointer points to data in MSTensor.
+virtual void *data() = 0;
+```
+
+</td>
+</tr>
+</table>
+
+###### Delete `DimensionSize()` in ms_tensor.h.([!13515](https://gitee.com/mindspore/mindspore/pulls/13515))
+
+The interface named `DimensionSize` is fuinctionally overlapped with the interface named `shape`. For the simplicity of the interface, we delete `DimensionSize` and recommend users to use the new interface named `shape` instead.
+
+<table>
+<tr>
+<td style="text-align:center"> DimensionSize() </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+/// \brief Get size of the dimension of the MindSpore Lite MSTensor index by the parameter index.
+///
+/// \param[in] index Define index of dimension returned.
+///
+/// \return Size of dimension of the MindSpore Lite MSTensor.
+virtual int DimensionSize(size_t index) const = 0;
+```
+
+</td>
+</tr>
+</table>
+
+###### Move allocator from namespace mindspore::lite to namespace lite for unified runtime API.([!13515](https://gitee.com/mindspore/mindspore/pulls/13515))
+
+Previously, class `Allocator` is in namespace mindspore::lite. Considering unified allocator interface for unified runtime API, we move `Allocator` to namespace mindspore.
+
+<table>
+<tr>
+<td style="text-align:center"> 1.1.0 </td> <td style="text-align:center"> 1.2.0 </td>
+</tr>
+<tr>
+<td>
+
+```cpp
+namespace mindspore::lite {
+/// \brief Allocator defined a memory pool for malloc memory and free memory dynamically.
+///
+/// \note List public class and interface for reference.
+class Allocator;
+}
+```
+
+</td>
+<td>
+
+```cpp
+namespace mindspore {
+/// \brief Allocator defined a memory pool for malloc memory and free memory dynamically.
+///
+/// \note List public class and interface for reference.
+class Allocator;
+}
+```
+
+</td>
+</tr>
+</table>
+
+### Bug fixes
+
+1. Fix the bug that the array in kernel registrar is not initialized.
+2. Fix segment fault caused by releasing of OpParameter in Crop kernel in mistake.
+3. Fix the bug that the MINDIR aware-training model is finally interpreted as weight-quant model.
+
+## Contributors
+
+Thanks goes to these wonderful people:
+
+Adel, AGroupofProbiotocs, anthonyaje, anzhengqi, askmiao, baihuawei, baiyangfan, bai-yangfan, bingyaweng, BowenK, buxue, caifubi, CaoJian, caojian05, caozhou, Cathy, changzherui, chenbo116, chenfei, chengxianbin, chenhaozhe, chenjianping, chenzomi, chenzupeng, chujinjin, cj, cjh9368, Corleone, damon0626, danish, Danish, davidmc, dayschan, doitH, dong-li001, eric, Eric, fary86, fuzhiye, Gaoxiong, GAO_HYP_XYJ, gengdongjie, Gogery, gongdaguo, gray0v0, gukecai, guoqi, gzhcv, hangq, hanhuifeng2020, Harshvardhan, He, heleiwang, hexia, Hoai, HuangBingjian, huangdongrun, huanghui, huangxinjing, huqi, huzhifeng, hwjiaorui, Islam Amin, Jesse, , Jiabin Liu, jianghui58, jiangzhiwen, Jiaqi, jin-xiulang, jinyaohui, jjfeing, John, Jonathan, jonyguo, JulyAi, jzg, kai00, kingfo, kingxian, kpy, kswang, laiyongqiang, leonwanghui, Li, liangchenghui, liangzelang, lichen_101010, lichenever, lihongkang, lilei, limingqi107, ling, linqingke, Lin Xh, liubuyu, liuwenhao4, liuxiao78, liuxiao93, liuyang_655, liuzhongkai, Lixia, lixian, liyanliu, liyong, lizhenyu, luopengting, luoyang, lvchangquan, lvliang, lz, mahdi, Mahdi, maning202007, Margaret_wangrui, mayang, mengyuanli, Ming_blue, nhussain, ougongchang, panfengfeng, panyifeng, Payne, Peilin, peixu_ren, Pengyongrong, qianlong, qianjiahong, r1chardf1d0, riemann_penn, rmdyh, Sheng, shenwei41, simson, Simson, Su, sunsuodong, tao_yunhao, tinazhang, VectorSL, , Wan, wandongdong, wangdongxu, wangmin, wangnan39@huawei.com, wangyue01, wangzhe, wanyiming, Wei, wenchunjiang, wilfChen, WilliamLian, wsc, wudenggang, wukesong, wuweikang, wuxuejian, Xiaoda, xiefangqi, xinyunfan, xuanyue, xulei2020, Xun, xuyongfei, yanghaitao, yanghaitao1, yanghaoran, YangLuo, yangruoqi713, yankai, yanzhenxiang2020, yao_yf, yepei6, yeyunpeng, Yi, yoni, yoonlee666, yuchaojie, yujianfeng, yuximiao, zengzitao, Zhang, zhanghaibo5@huawei.com, zhanghuiyao, zhanghui_china, zhangxinfeng3, zhangyihui, zhangz0911gm, zhanke, zhanyuan, zhaodezan, zhaojichen, zhaoting, zhaozhenlong, zhengjun10, zhiqwang, zhoufeng, zhousiyi, zhouyaqiang, zhouyifengCode, Zichun, Zirui, Ziyan, zjun, ZPaC, zymaa.
+
+Contributions of any kind are welcome!
+
 # MindSpore 1.1.1 Release Notes

 ## MindSpore
@ -295,7 +958,7 @@ Examples:
    ...         self.depend = P.Depend()
    ...
    ...     def construct(self, x, y):
-    ...         mul = x * y
+    ...         mul = x - y
    ...         y = self.depend(y, mul)
    ...         ret = self.softmax(y)
    ...         return ret
--- a/2
+++ b/2
@ -1 +1 @@
-Subproject commit e2a0a264a0be549b51a035e5f783927052f8ead8
+Subproject commit a5a856cd2ccabd896be3ce44544eea26bf90e764
--- a/build.bat
+++ b/build.bat
@ -26,16 +26,12 @@ set VERSION_MAJOR=''
 set VERSION_MINOR=''
 set ERSION_REVISION=''

-find "const int ms_version_major =" mindspore\lite\include\version.h > version.txt
-for /f "delims=\= tokens=2" %%a in ('findstr "const int ms_version_major = " version.txt') do (set x=%%a)
+for /f "delims=\= tokens=2" %%a in ('findstr /C:"const int ms_version_major = " mindspore\lite\include\version.h') do (set x=%%a)
 set VERSION_MAJOR=%x:~1,1%
-find "const int ms_version_minor =" mindspore\lite\include\version.h > version.txt
-for /f "delims=\= tokens=2" %%b in ('findstr "const int ms_versio/retestn_minor = " version.txt') do (set y=%%b)
+for /f "delims=\= tokens=2" %%b in ('findstr /C:"const int ms_version_minor = " mindspore\lite\include\version.h') do (set y=%%b)
 set VERSION_MINOR=%y:~1,1%
-find "const int ms_version_revision =" mindspore\lite\include\version.h > version.txt
-for /f "delims=\= tokens=2" %%c in ('findstr "const int ms_version_revision = " version.txt') do (set z=%%c)
+for /f "delims=\= tokens=2" %%c in ('findstr /C:"const int ms_version_revision = " mindspore\lite\include\version.h') do (set z=%%c)
 set VERSION_REVISION=%z:~1,1%
-del version.txt

 echo "======Start building MindSpore Lite %VERSION_MAJOR%.%VERSION_MINOR%.%VERSION_REVISION%======"

@ -78,6 +74,8 @@ IF NOT EXIST "%BUILD_PATH%/mindspore" (

 cd %BUILD_PATH%/mindspore
 IF "%1%" == "lite" (
+    cmake --build "%BUILD_PATH%\mindspore" --target clean
+    rd /s /q "%BASE_PATH%\output"
    (git log -1 | findstr "^commit") > %BUILD_PATH%\.commit_id
    cmake -DPLATFORM_ARM64=off -DSUPPORT_TRAIN=off ^
    -DENABLE_TOOLS=on -DENABLE_CONVERTER=on -DBUILD_TESTCASES=off ^
--- a/build.sh
+++ b/build.sh
@ -536,27 +536,16 @@ write_commit_file() {
    echo ${COMMIT_STR} > "${BASEPATH}/mindspore/lite/build/.commit_id"
 }

-gen_fbs() {
-  if [[ "${ENABLE_TOOLS}" == "on" ]]; then
-    if [[ -f ${BASEPATH}/mindspore/lite/build/tools/schema_gen/schema_gen ]]; then
-      cd ${BASEPATH}/mindspore/lite/build/tools/schema_gen
-      ./schema_gen
-      cd -
-      diff_ops=$(diff ${BASEPATH}/mindspore/lite/build/tools/schema_gen/ops.fbs ${BASEPATH}/mindspore/lite/schema/ops.fbs || true)
-      if [[ "X${diff_ops}" != "X" ]]; then
-        cp ${BASEPATH}/mindspore/lite/build/tools/schema_gen/ops.fbs ${BASEPATH}/mindspore/lite/schema/
-      fi
-    fi
-  fi
-}
-
 build_lite()
 {
+    rm -rf ${BASEPATH}/output/*
    get_version
    echo "============ Start building MindSpore Lite ${VERSION_STR} ============"
    local LOCAL_LITE_PLATFORM=${LITE_PLATFORM}
    local LOCAL_INC_BUILD=${INC_BUILD}
    local LOCAL_LITE_ENABLE_GPU=${LITE_ENABLE_GPU}
+    local LOCAL_LITE_ENABLE_NPU=${ENABLE_NPU}
+
    if [[ "${LITE_LANGUAGE}" == "java" ]]; then
      if [[ "X$1" != "X" ]]; then
        LOCAL_LITE_PLATFORM=$1
@ -573,13 +562,23 @@ build_lite()
      else
        LOCAL_LITE_ENABLE_GPU=""
      fi
+      mkdir -p ${BASEPATH}/mindspore/lite/build/java
+      cd ${BASEPATH}/mindspore/lite/build/
+      find . -maxdepth 1 | grep -v java | grep '/' | xargs -I {} rm -rf {}
    fi
-    LITE_ENABLE_NPU=${ENABLE_NPU}
-    if [[ "${LITE_LANGUAGE}" == "cpp" && "${DEVICE}" == ""  &&  "${LOCAL_LITE_PLATFORM}" == "arm64" ]]; then
-      LOCAL_LITE_ENABLE_GPU="opencl"
-      LITE_ENABLE_NPU="on"
+    if [[ "${LITE_LANGUAGE}" == "cpp"  ]]; then
+      if [[ "${DEVICE}" == ""  &&  "${LOCAL_LITE_PLATFORM}" == "arm64" ]]; then
+        LOCAL_LITE_ENABLE_GPU="opencl"
+        LOCAL_LITE_ENABLE_NPU="on"
+      fi
+
+      if [[ "${LOCAL_INC_BUILD}" == "off" ]]; then
+          rm -rf ${BASEPATH}/mindspore/lite/build
+      fi
+      mkdir -pv ${BASEPATH}/mindspore/lite/build
    fi
-    if [ "${LITE_ENABLE_NPU}" == "on" ]; then
+
+    if [ "${LOCAL_LITE_ENABLE_NPU}" == "on" ]; then
      if [ "${LOCAL_LITE_PLATFORM}" == "arm64" ]; then
        checkddk
      else
@ -588,12 +587,7 @@ build_lite()
      fi
    fi

-    cd "${BASEPATH}/mindspore/lite"
-    if [[ "${LOCAL_INC_BUILD}" == "off" ]]; then
-        rm -rf build
-    fi
-    mkdir -pv build
-    cd build
+    cd ${BASEPATH}/mindspore/lite/build
    write_commit_file
    BUILD_TYPE="Release"
    if [[ "${DEBUG_MODE}" == "on" ]]; then
@ -607,7 +601,7 @@ build_lite()
              -DANDROID_STL=${ANDROID_STL} -DCMAKE_BUILD_TYPE=${BUILD_TYPE} -DSUPPORT_TRAIN=${SUPPORT_TRAIN}                     \
              -DPLATFORM_ARM64=on -DENABLE_NEON=on -DENABLE_FP16="on"      \
              -DENABLE_TOOLS=${ENABLE_TOOLS} -DENABLE_CONVERTER=${ENABLE_CONVERTER} -DBUILD_TESTCASES=${RUN_TESTCASES} \
-              -DSUPPORT_GPU=${LOCAL_LITE_ENABLE_GPU} -DSUPPORT_NPU=${LITE_ENABLE_NPU} -DENABLE_V0=on \
+              -DSUPPORT_GPU=${LOCAL_LITE_ENABLE_GPU} -DSUPPORT_NPU=${LOCAL_LITE_ENABLE_NPU} -DENABLE_V0=on \
              -DOFFLINE_COMPILE=${OPENCL_OFFLINE_COMPILE} -DBUILD_MINDDATA=${COMPILE_MINDDATA_LITE} \
              -DCMAKE_INSTALL_PREFIX=${BASEPATH}/output/tmp -DMS_VERSION_MAJOR=${VERSION_MAJOR}                           \
              -DMS_VERSION_MINOR=${VERSION_MINOR} -DMS_VERSION_REVISION=${VERSION_REVISION} -DENABLE_VERBOSE=${ENABLE_VERBOSE} \
@ -619,7 +613,7 @@ build_lite()
              -DANDROID_STL=${ANDROID_STL}  -DCMAKE_BUILD_TYPE=${BUILD_TYPE}                                                      \
              -DPLATFORM_ARM32=on -DENABLE_NEON=on -DSUPPORT_TRAIN=${SUPPORT_TRAIN}  \
              -DENABLE_TOOLS=${ENABLE_TOOLS} -DENABLE_CONVERTER=${ENABLE_CONVERTER} -DBUILD_TESTCASES=${RUN_TESTCASES} \
-              -DSUPPORT_GPU=${LOCAL_LITE_ENABLE_GPU} -DSUPPORT_NPU=${ENABLE_NPU} -DENABLE_V0=on \
+              -DSUPPORT_GPU=${LOCAL_LITE_ENABLE_GPU} -DSUPPORT_NPU=${LOCAL_LITE_ENABLE_NPU} -DENABLE_V0=on \
              -DOFFLINE_COMPILE=${OPENCL_OFFLINE_COMPILE} -DBUILD_MINDDATA=${COMPILE_MINDDATA_LITE} \
              -DCMAKE_INSTALL_PREFIX=${BASEPATH}/output/tmp -DMS_VERSION_MAJOR=${VERSION_MAJOR}                           \
              -DMS_VERSION_MINOR=${VERSION_MINOR} -DMS_VERSION_REVISION=${VERSION_REVISION} -DENABLE_VERBOSE=${ENABLE_VERBOSE} \
@ -627,19 +621,22 @@ build_lite()
    else
        cmake -DPLATFORM_ARM64=off -DSUPPORT_TRAIN=${SUPPORT_TRAIN}   \
        -DENABLE_TOOLS=${ENABLE_TOOLS} -DENABLE_CONVERTER=${ENABLE_CONVERTER} -DBUILD_TESTCASES=${RUN_TESTCASES} \
-        -DCMAKE_BUILD_TYPE=${BUILD_TYPE} -DSUPPORT_GPU=${LOCAL_LITE_ENABLE_GPU} -DSUPPORT_NPU=${ENABLE_NPU} \
+        -DCMAKE_BUILD_TYPE=${BUILD_TYPE} -DSUPPORT_GPU=${LOCAL_LITE_ENABLE_GPU} -DSUPPORT_NPU=${LOCAL_LITE_ENABLE_NPU} \
        -DBUILD_MINDDATA=${COMPILE_MINDDATA_LITE} -DENABLE_V0=on \
        -DOFFLINE_COMPILE=${OPENCL_OFFLINE_COMPILE} -DCMAKE_INSTALL_PREFIX=${BASEPATH}/output/tmp  \
        -DMS_VERSION_MAJOR=${VERSION_MAJOR} -DMS_VERSION_MINOR=${VERSION_MINOR} -DMS_VERSION_REVISION=${VERSION_REVISION} \
        -DENABLE_VERBOSE=${ENABLE_VERBOSE} -DX86_64_SIMD=${X86_64_SIMD} "${BASEPATH}/mindspore/lite"
    fi
    make -j$THREAD_NUM && make install && make package
-    gen_fbs
    if [[ $? -ne 0 ]]; then
        echo "---------------- mindspore lite: build failed ----------------"
        exit 1
    else
-        mv ${BASEPATH}/output/tmp/*.tar.gz* ${BASEPATH}/output/
+        if [[ "${LITE_LANGUAGE}" == "cpp"  ]]; then
+          mv ${BASEPATH}/output/tmp/*.tar.gz* ${BASEPATH}/output/
+        elif [[ "${LITE_LANGUAGE}" == "java" ]]; then
+          mv ${BASEPATH}/output/tmp/*.tar.gz* ${BASEPATH}/mindspore/lite/build/java
+        fi
        rm -rf ${BASEPATH}/output/tmp/
        echo "---------------- mindspore lite: build success ----------------"
        if [[ "X$LITE_LANGUAGE" = "Xcpp" ]]; then
@ -654,7 +651,7 @@ build_lite_java_arm64() {
    if [[ "X$SUPPORT_TRAIN" = "Xon" ]]; then
        JTARBALL=mindspore-lite-${VERSION_STR}-train-android-aarch64
    fi
-    if [[ "X$INC_BUILD" = "Xoff" ]] || [[ ! -f "${BASEPATH}/output/${JTARBALL}.tar.gz" ]]; then
+    if [[ "X$INC_BUILD" == "Xoff" ]] || [[ ! -f "${BASEPATH}/mindspore/lite/build/java/${JTARBALL}.tar.gz" ]]; then
      if [[ "X${DEVICE}" == "Xcpu" ]]; then
          build_lite "arm64" "off" ""
      elif [[ "X${DEVICE}" == "Xnpu" ]]; then
@ -665,18 +662,18 @@ build_lite_java_arm64() {
      fi
    fi
    # copy arm64 so
-    cd ${BASEPATH}/output/
+    cd ${BASEPATH}/mindspore/lite/build/java/
    rm -rf ${JTARBALL}
    tar -zxvf ${JTARBALL}.tar.gz
    [ -n "${JAVA_PATH}" ] && rm -rf ${JAVA_PATH}/java/app/libs/arm64-v8a/
    mkdir -p ${JAVA_PATH}/java/app/libs/arm64-v8a/
    mkdir -p ${JAVA_PATH}/native/libs/arm64-v8a/
    if [[ "X$SUPPORT_TRAIN" = "Xon" ]]; then
-      cp ${BASEPATH}/output/${JTARBALL}/train/lib/libmindspore-lite.so ${JAVA_PATH}/java/app/libs/arm64-v8a/
-      cp ${BASEPATH}/output/${JTARBALL}/train/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/arm64-v8a/
+      cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/train/lib/libmindspore-lite.so ${JAVA_PATH}/java/app/libs/arm64-v8a/
+      cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/train/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/arm64-v8a/
    else
-      cp ${BASEPATH}/output/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/java/app/libs/arm64-v8a/
-      cp ${BASEPATH}/output/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/arm64-v8a/
+      cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/java/app/libs/arm64-v8a/
+      cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/arm64-v8a/
    fi
    [ -n "${VERSION_STR}" ] && rm -rf ${JTARBALL}
 }
@ -687,22 +684,22 @@ build_lite_java_arm32() {
    if [[ "X$SUPPORT_TRAIN" = "Xon" ]]; then
        JTARBALL=mindspore-lite-${VERSION_STR}-train-android-aarch32
    fi
-    if [[ "X$INC_BUILD" = "Xoff" ]] || [[ ! -f "${BASEPATH}/output/${JTARBALL}.tar.gz" ]]; then
+    if [[ "X$INC_BUILD" == "Xoff" ]] || [[ ! -f "${BASEPATH}/mindspore/lite/build/java/${JTARBALL}.tar.gz" ]]; then
      build_lite  "arm32" "off" ""
    fi
    # copy arm32 so
-    cd ${BASEPATH}/output/
+    cd ${BASEPATH}/mindspore/lite/build/java/
    rm -rf ${JTARBALL}
    tar -zxvf ${JTARBALL}.tar.gz
    [ -n "${JAVA_PATH}" ] && rm -rf ${JAVA_PATH}/java/app/libs/armeabi-v7a/
    mkdir -p ${JAVA_PATH}/java/app/libs/armeabi-v7a/
    mkdir -p ${JAVA_PATH}/native/libs/armeabi-v7a/
    if [[ "X$SUPPORT_TRAIN" = "Xon" ]]; then
-      cp ${BASEPATH}/output/${JTARBALL}/train/lib/libmindspore-lite.so ${JAVA_PATH}/java/app/libs/armeabi-v7a/
-      cp ${BASEPATH}/output/${JTARBALL}/train/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/armeabi-v7a/
+      cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/train/lib/libmindspore-lite.so ${JAVA_PATH}/java/app/libs/armeabi-v7a/
+      cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/train/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/armeabi-v7a/
    else
-      cp ${BASEPATH}/output/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/java/app/libs/armeabi-v7a/
-      cp ${BASEPATH}/output/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/armeabi-v7a/
+      cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/java/app/libs/armeabi-v7a/
+      cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/armeabi-v7a/
    fi
    [ -n "${VERSION_STR}" ] && rm -rf ${JTARBALL}
 }
@ -710,26 +707,26 @@ build_lite_java_arm32() {
 build_lite_java_x86() {
    # build mindspore-lite x86
    local JTARBALL=mindspore-lite-${VERSION_STR}-inference-linux-x64
-    if [[ "X$INC_BUILD" = "Xoff" ]] || [[ ! -f "${BASEPATH}/output/${JTARBALL}.tar.gz" ]]; then
+    if [[ "X$INC_BUILD" == "Xoff" ]] || [[ ! -f "${BASEPATH}/mindspore/lite/build/java/${JTARBALL}.tar.gz" ]]; then
      build_lite "x86_64" "off" ""
    fi
    # copy x86 so
-    cd ${BASEPATH}/output/
+    cd ${BASEPATH}/mindspore/lite/build/java
    rm -rf ${JTARBALL}
    tar -zxvf ${JTARBALL}.tar.gz
    [ -n "${JAVA_PATH}" ] && rm -rf ${JAVA_PATH}/java/linux_x86/libs/
    mkdir -p ${JAVA_PATH}/java/linux_x86/libs/
    mkdir -p ${JAVA_PATH}/native/libs/linux_x86/
-    cp ${BASEPATH}/output/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/java/linux_x86/libs/
-    cp ${BASEPATH}/output/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/linux_x86/
+    cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/java/linux_x86/libs/
+    cp ${BASEPATH}/mindspore/lite/build/java/${JTARBALL}/inference/lib/libmindspore-lite.so ${JAVA_PATH}/native/libs/linux_x86/
 }

 build_jni_arm64() {
    # build jni so
    cd "${BASEPATH}/mindspore/lite/build"
-    rm -rf java
-    mkdir -pv java
-    cd java
+    rm -rf java/jni
+    mkdir -pv java/jni
+    cd java/jni
    cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" -DANDROID_NATIVE_API_LEVEL="19"      \
          -DANDROID_NDK="${ANDROID_NDK}" -DANDROID_ABI="arm64-v8a" -DANDROID_TOOLCHAIN_NAME="aarch64-linux-android-clang"  \
          -DMS_VERSION_MAJOR=${VERSION_MAJOR} -DMS_VERSION_MINOR=${VERSION_MINOR} -DMS_VERSION_REVISION=${VERSION_REVISION} \
@ -741,17 +738,17 @@ build_jni_arm64() {
        exit 1
    fi
    mkdir -p ${JAVA_PATH}/java/app/libs/arm64-v8a/
-    cp ${BASEPATH}/mindspore/lite/build/java/libmindspore-lite-jni.so ${JAVA_PATH}/java/app/libs/arm64-v8a/
+    cp ${BASEPATH}/mindspore/lite/build/java/jni/libmindspore-lite-jni.so ${JAVA_PATH}/java/app/libs/arm64-v8a/
    mkdir -p ${JAVA_PATH}/native/libs/arm64-v8a/
-    cp ${BASEPATH}/mindspore/lite/build/java/libmindspore-lite-jni.so ${JAVA_PATH}/native/libs/arm64-v8a/
+    cp ${BASEPATH}/mindspore/lite/build/java/jni/libmindspore-lite-jni.so ${JAVA_PATH}/native/libs/arm64-v8a/
 }

 build_jni_arm32() {
    # build jni so
    cd "${BASEPATH}/mindspore/lite/build"
-    rm -rf java
-    mkdir -pv java
-    cd java
+    rm -rf java/jni
+    mkdir -pv java/jni
+    cd java/jni
    cmake -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" -DANDROID_NATIVE_API_LEVEL="19"      \
          -DANDROID_NDK="${ANDROID_NDK}" -DANDROID_ABI="armeabi-v7a" -DANDROID_TOOLCHAIN_NAME="aarch64-linux-android-clang"  \
          -DMS_VERSION_MAJOR=${VERSION_MAJOR} -DMS_VERSION_MINOR=${VERSION_MINOR} -DMS_VERSION_REVISION=${VERSION_REVISION} \
@ -763,17 +760,17 @@ build_jni_arm32() {
        exit 1
    fi
    mkdir -p ${JAVA_PATH}/java/app/libs/armeabi-v7a/
-    cp ${BASEPATH}/mindspore/lite/build/java/libmindspore-lite-jni.so ${JAVA_PATH}/java/app/libs/armeabi-v7a/
+    cp ${BASEPATH}/mindspore/lite/build/java/jni/libmindspore-lite-jni.so ${JAVA_PATH}/java/app/libs/armeabi-v7a/
    mkdir -p ${JAVA_PATH}/native/libs/armeabi-v7a/
-    cp ${BASEPATH}/mindspore/lite/build/java/libmindspore-lite-jni.so ${JAVA_PATH}/native/libs/armeabi-v7a/
+    cp ${BASEPATH}/mindspore/lite/build/java/jni/libmindspore-lite-jni.so ${JAVA_PATH}/native/libs/armeabi-v7a/
 }

 build_jni_x86_64() {
    # build jni so
    cd "${BASEPATH}/mindspore/lite/build"
-    rm -rf java
-    mkdir -pv java
-    cd java
+    rm -rf java/jni
+    mkdir -pv java/jni
+    cd java/jni
    cmake -DMS_VERSION_MAJOR=${VERSION_MAJOR} -DMS_VERSION_MINOR=${VERSION_MINOR} -DMS_VERSION_REVISION=${VERSION_REVISION} \
        -DENABLE_VERBOSE=${ENABLE_VERBOSE}  "${JAVA_PATH}/native/"
    make -j$THREAD_NUM
@ -782,9 +779,9 @@ build_jni_x86_64() {
        exit 1
    fi
    mkdir -p ${JAVA_PATH}/java/linux_x86/libs/
-    cp ${BASEPATH}/mindspore/lite/build/java/libmindspore-lite-jni.so ${JAVA_PATH}/java/linux_x86/libs/
+    cp ${BASEPATH}/mindspore/lite/build/java/jni/libmindspore-lite-jni.so ${JAVA_PATH}/java/linux_x86/libs/
    mkdir -p ${JAVA_PATH}/native/libs/linux_x86/
-    cp ${BASEPATH}/mindspore/lite/build/java/libmindspore-lite-jni.so ${JAVA_PATH}/native/libs/linux_x86/
+    cp ${BASEPATH}/mindspore/lite/build/java/jni/libmindspore-lite-jni.so ${JAVA_PATH}/native/libs/linux_x86/
 }

 check_java_home() {
@ -799,6 +796,9 @@ check_java_home() {
 build_java() {
    JAVA_PATH=${BASEPATH}/mindspore/lite/java
    get_version
+    if [[ "X${INC_BUILD}" == "Xoff" ]]; then
+        rm -rf ${BASEPATH}/mindspore/lite/build
+    fi
    # build common module
    cd ${JAVA_PATH}/java/common
    gradle clean
@ -820,8 +820,6 @@ build_java() {

    cd ${JAVA_PATH}/java/app/build
    zip -r mindspore-lite-maven-${VERSION_STR}.zip mindspore
-    # copy output
-    cp mindspore-lite-maven-${VERSION_STR}.zip ${BASEPATH}/output/

    # build linux x86 jar
    check_java_home
@ -838,13 +836,14 @@ build_java() {
    mkdir -p ${JAVA_PATH}/java/linux_x86/build/lib
    cp ${JAVA_PATH}/java/linux_x86/libs/*.so ${JAVA_PATH}/java/linux_x86/build/lib/jar
    cd ${JAVA_PATH}/java/linux_x86/build/
-    cp -r ${JAVA_PATH}/java/linux_x86/build/lib ${JAVA_PATH}/java/linux_x86/build/mindspore-lite-${VERSION_STR}-inference-linux-x64-jar
-    mkdir -p ${JAVA_PATH}/java/linux_x86/build/mindspore-lite-${VERSION_STR}-inference-linux-x64-jar
-    tar czvf mindspore-lite-${VERSION_STR}-inference-linux-x64-jar.tar.gz ./mindspore-lite-${VERSION_STR}-inference-linux-x64-jar
+    local LINUX_X86_PACKAGE_NAME=mindspore-lite-${VERSION_STR}-inference-linux-x64-jar
+    cp -r ${JAVA_PATH}/java/linux_x86/build/lib ${JAVA_PATH}/java/linux_x86/build/${LINUX_X86_PACKAGE_NAME}
+    tar czvf ${LINUX_X86_PACKAGE_NAME}.tar.gz ${LINUX_X86_PACKAGE_NAME}
    # copy output
-    cp mindspore-lite-${VERSION_STR}-inference-linux-x64-jar.tar.gz ${BASEPATH}/output
+    cp ${JAVA_PATH}/java/app/build/mindspore-lite-maven-${VERSION_STR}.zip ${BASEPATH}/output
+    cp ${LINUX_X86_PACKAGE_NAME}.tar.gz ${BASEPATH}/output
    cd ${BASEPATH}/output
-    [ -n "${VERSION_STR}" ] && rm -rf mindspore-lite-${VERSION_STR}-inference-linux-x64
+    [ -n "${VERSION_STR}" ] && rm -rf ${BASEPATH}/mindspore/lite/build/java/mindspore-lite-${VERSION_STR}-inference-linux-x64
    exit 0
 }

--- a/cmake/check_requirements.cmake
+++ b/cmake/check_requirements.cmake
@ -1,4 +1,4 @@
-## define customized find fucntions, print customized error messages
+## define customized find functions, print customized error messages
 function(find_required_package pkg_name)
    find_package(${pkg_name})
    if(NOT ${pkg_name}_FOUND)
@ -24,7 +24,7 @@ if(Python3_FOUND)
    message("Python3 library path: ${Python3_LIBRARY}")
    message("Python3 interpreter: ${Python3_EXECUTABLE}")
 elseif(Python3_LIBRARY AND Python3_EXECUTABLE AND
-        ${Python3_VERSION} VERSION_GREATER_EQUAL "3.7.0" AND ${Python3_VERSION} VERSION_LESS "3.8.9")
+        ${Python3_VERSION} VERSION_GREATER_EQUAL "3.7.0" AND ${Python3_VERSION} VERSION_LESS "3.9.9")
    message(WARNING "Maybe python3 environment is broken.")
    message("Python3 library path: ${Python3_LIBRARY}")
    message("Python3 interpreter: ${Python3_EXECUTABLE}")
--- a/cmake/dependency_gtest.cmake
+++ b/cmake/dependency_gtest.cmake
@ -17,7 +17,8 @@ if(NOT TARGET gtest)
  set(CMAKE_MACOSX_RPATH TRUE)
  set(CMAKE_CXX_FLAGS "${SECURE_CXX_FLAGS}")

-  if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "5.0" AND CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "x86_64" AND SYSTEM_TYPE MATCHES "euleros")
+  if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "5.0"
+    AND CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "x86_64" AND SYSTEM_TYPE MATCHES "euleros")
    # -D_GLIBCXX_USE_CXX11_ABI=0 added for the ABI incompatible for libtsdclient.so
    # set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
  endif()
--- a/cmake/dependency_protobuf.cmake
+++ b/cmake/dependency_protobuf.cmake
@ -86,8 +86,10 @@ function(ms_protobuf_generate_py c_var h_var py_var)
                COMMAND protobuf::protoc -I${file_dir} --cpp_out=${CMAKE_BINARY_DIR}/${rel_path} ${abs_file}
                COMMAND protobuf::protoc -I${file_dir} --python_out=${CMAKE_BINARY_DIR}/${rel_path} ${abs_file}
                COMMAND protobuf::protoc -I${file_dir} --python_out=${CMAKE_BINARY_DIR}/${rel_path} ${abs_file}
-                COMMAND perl -pi -e "s/import (.+_pb2.*)/from . import \\1/"  "${CMAKE_BINARY_DIR}/${rel_path}/${file_name}_pb2.py"
-                COMMAND cp "${CMAKE_BINARY_DIR}/${rel_path}/${file_name}_pb2.py" "${PROJECT_SOURCE_DIR}/mindspore/train/"
+                COMMAND perl -pi -e "s/import (.+_pb2.*)/from . import \\1/"
+                        "${CMAKE_BINARY_DIR}/${rel_path}/${file_name}_pb2.py"
+                COMMAND cp "${CMAKE_BINARY_DIR}/${rel_path}/${file_name}_pb2.py"
+                        "${PROJECT_SOURCE_DIR}/mindspore/train/"
                DEPENDS protobuf::protoc ${abs_file}
                COMMENT "Running C++ protocol buffer compiler on ${file}" VERBATIM)
    endforeach()
--- a/cmake/dependency_utils.cmake
+++ b/cmake/dependency_utils.cmake
@ -17,7 +17,8 @@ function(find_python_package out_inc out_lib)
  set(${out_inc} ${inc} PARENT_SCOPE)

  execute_process(
-          COMMAND "${PYTHON_EXECUTABLE}" -c "import distutils.sysconfig as sysconfig; import os; print(os.path.join(sysconfig.get_config_var('LIBDIR'), sysconfig.get_config_var('LDLIBRARY')))"
+          COMMAND "${PYTHON_EXECUTABLE}" -c "import distutils.sysconfig as sysconfig; import os; \
+                  print(os.path.join(sysconfig.get_config_var('LIBDIR'), sysconfig.get_config_var('LDLIBRARY')))"
          RESULT_VARIABLE result
          OUTPUT_VARIABLE lib)
  string(STRIP "${lib}" lib)
--- a/cmake/external_libs/flatbuffers.cmake
+++ b/cmake/external_libs/flatbuffers.cmake
@ -63,11 +63,8 @@ function(ms_build_flatbuffers source_schema_files
    endif()
 endfunction()

-function(ms_build_flatbuffers_lite source_schema_files
-                                   source_schema_dirs
-                                   custom_target_name
-                                   generated_output_dir
-                                   if_inner)
+function(ms_build_flatbuffers_lite
+  source_schema_files source_schema_dirs custom_target_name generated_output_dir if_inner)

    set(total_schema_dirs "")
    set(total_generated_files "")
--- a/cmake/external_libs/glog.cmake
+++ b/cmake/external_libs/glog.cmake
@ -1,5 +1,15 @@
-set(glog_CXXFLAGS "-D_FORTIFY_SOURCE=2 -O2 ${SECURE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
+set(glog_CXXFLAGS "-D_FORTIFY_SOURCE=2 -O2 ${SECURE_CXX_FLAGS} -Dgoogle=mindspore_private")
 set(glog_CFLAGS "-D_FORTIFY_SOURCE=2 -O2")
+if(NOT ENABLE_GLIBCXX)
+    set(glog_CXXFLAGS "${glog_CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
+endif()
+if(BUILD_LITE)
+    set(glog_patch "")
+    set(glog_lib glog)
+else()
+    set(glog_patch ${CMAKE_SOURCE_DIR}/third_party/patch/glog/glog.patch001)
+    set(glog_lib mindspore_glog)
+endif()
 if(ENABLE_GITEE)
    set(REQ_URL "https://gitee.com/mirrors/glog/repository/archive/v0.4.0.tar.gz")
    set(MD5 "22fe340ddc231e6c8e46bc295320f8ee")
@ -7,11 +17,13 @@ else()
    set(REQ_URL "https://github.com/google/glog/archive/v0.4.0.tar.gz")
    set(MD5 "0daea8785e6df922d7887755c3d100d0")
 endif()
+
 mindspore_add_pkg(glog
        VER 0.4.0
-        LIBS glog
+        LIBS ${glog_lib}
        URL ${REQ_URL}
        MD5 ${MD5}
+        PATCHES ${glog_patch}
        CMAKE_OPTION -DBUILD_TESTING=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DBUILD_SHARED_LIBS=ON -DWITH_GFLAGS=OFF)
 include_directories(${glog_INC})
-add_library(mindspore::glog ALIAS glog::glog)
+add_library(mindspore::glog ALIAS glog::${glog_lib})
--- a/cmake/external_libs/grpc.cmake
+++ b/cmake/external_libs/grpc.cmake
@ -1,10 +1,16 @@
 set(grpc_USE_STATIC_LIBS ON)
 if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
-    set(grpc_CXXFLAGS "-fstack-protector-all -Wno-uninitialized -Wno-unused-parameter -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
+    set(grpc_CXXFLAGS "-fstack-protector-all -Wno-uninitialized -Wno-unused-parameter -fPIC \
+        -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
 elseif(${CMAKE_SYSTEM_NAME} MATCHES "Windows")
-    set(grpc_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
+    set(grpc_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter \
+        -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
 else()
-    set(grpc_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -D_GLIBCXX_USE_CXX11_ABI=0 -O2")
+    set(grpc_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter \
+        -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
+    if(NOT ENABLE_GLIBCXX)
+        set(grpc_CXXFLAGS "${grpc_CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
+    endif()
 endif()

 set(grpc_LDFLAGS "-Wl,-z,relro,-z,now,-z,noexecstack")
@ -106,7 +112,8 @@ function(ms_grpc_generate c_var h_var)
                COMMAND ${CMAKE_COMMAND} -E make_directory "${CMAKE_BINARY_DIR}/proto"
                COMMAND protobuf::protoc --version
                COMMAND protobuf::protoc -I${file_dir} --cpp_out=${CMAKE_BINARY_DIR}/proto
-                --grpc_out=${CMAKE_BINARY_DIR}/proto --plugin=protoc-gen-grpc=$<TARGET_FILE:grpc::grpc_cpp_plugin> ${abs_file}
+                        --grpc_out=${CMAKE_BINARY_DIR}/proto
+                        --plugin=protoc-gen-grpc=$<TARGET_FILE:grpc::grpc_cpp_plugin> ${abs_file}
                DEPENDS protobuf::protoc grpc::grpc_cpp_plugin ${abs_file}
                COMMENT "Running C++ gRPC compiler on ${file}" VERBATIM)
    endforeach()
@ -114,5 +121,4 @@ function(ms_grpc_generate c_var h_var)
    set_source_files_properties(${${c_var}} ${${h_var}} PROPERTIES GENERATED TRUE)
    set(${c_var} ${${c_var}} PARENT_SCOPE)
    set(${h_var} ${${h_var}} PARENT_SCOPE)
-
 endfunction()
--- a/cmake/external_libs/gtest.cmake
+++ b/cmake/external_libs/gtest.cmake
@ -24,7 +24,9 @@ if(BUILD_LITE)
                ${CMAKE_OPTION})
    endif()
 else()
-    set(gtest_CXXFLAGS "${gtest_CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
+    if(NOT ENABLE_GLIBCXX)
+        set(gtest_CXXFLAGS "${gtest_CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
+    endif()
 endif()

 if(ENABLE_GITEE)
--- a/cmake/external_libs/icu4c.cmake
+++ b/cmake/external_libs/icu4c.cmake
@ -28,7 +28,9 @@ else()
                URL ${REQ_URL}
                MD5 ${MD5}
                PATCHES ${CMAKE_SOURCE_DIR}/third_party/patch/icu4c/icu4c.patch01
-                CONFIGURE_COMMAND ./icu4c/source/runConfigureICU MacOSX --enable-rpath --disable-tests --disable-samples --disable-icuio --disable-extras ICU_DATA_FILTER_FILE=${CMAKE_BINARY_DIR}/icu4c_filter.json
+                CONFIGURE_COMMAND ./icu4c/source/runConfigureICU MacOSX --enable-rpath --disable-tests
+                                  --disable-samples --disable-icuio --disable-extras
+                                  ICU_DATA_FILTER_FILE=${CMAKE_BINARY_DIR}/icu4c_filter.json
                )
    else()
        mindspore_add_pkg(icu4c
@ -37,7 +39,9 @@ else()
                URL ${REQ_URL}
                MD5 ${MD5}
                PATCHES ${CMAKE_SOURCE_DIR}/third_party/patch/icu4c/icu4c.patch01
-                CONFIGURE_COMMAND ./icu4c/source/runConfigureICU Linux --enable-rpath --disable-tests --disable-samples --disable-icuio --disable-extras ICU_DATA_FILTER_FILE=${CMAKE_BINARY_DIR}/icu4c_filter.json
+                CONFIGURE_COMMAND ./icu4c/source/runConfigureICU Linux --enable-rpath --disable-tests --disable-samples
+                                  --disable-icuio --disable-extras
+                                  ICU_DATA_FILTER_FILE=${CMAKE_BINARY_DIR}/icu4c_filter.json
                )
    endif()
    include_directories(${icu4c_INC})
--- a/cmake/external_libs/jpeg_turbo.cmake
+++ b/cmake/external_libs/jpeg_turbo.cmake
@ -8,9 +8,11 @@ else()
 endif()

 if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
-    set(jpeg_turbo_CFLAGS "-fstack-protector-all -Wno-uninitialized -Wno-unused-parameter -fPIC -D_FORTIFY_SOURCE=2 -O2")
+    set(jpeg_turbo_CFLAGS "-fstack-protector-all -Wno-uninitialized -Wno-unused-parameter -fPIC -D_FORTIFY_SOURCE=2 \
+        -O2")
 else()
-    set(jpeg_turbo_CFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -fPIC -D_FORTIFY_SOURCE=2 -O2")
+    set(jpeg_turbo_CFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -fPIC \
+        -D_FORTIFY_SOURCE=2 -O2")
 endif()

 set(jpeg_turbo_LDFLAGS "-Wl,-z,relro,-z,now,-z,noexecstack,-s")
--- a/cmake/external_libs/libevent.cmake
+++ b/cmake/external_libs/libevent.cmake
@ -7,7 +7,8 @@ if(ENABLE_GITEE)
    set(REQ_URL "https://gitee.com/mirrors/libevent/repository/archive/release-2.1.12-stable.tar.gz")
    set(MD5 "c9036513dd9e5b4fa1c81ade23b7ead2")
 else()
-    set(REQ_URL "https://github.com/libevent/libevent/releases/download/release-2.1.12-stable/libevent-2.1.12-stable.tar.gz")
+    set(REQ_URL
+      "https://github.com/libevent/libevent/releases/download/release-2.1.12-stable/libevent-2.1.12-stable.tar.gz")
    set(MD5 "b5333f021f880fe76490d8a799cd79f4")
 endif()

--- a/cmake/external_libs/opencl.cmake
+++ b/cmake/external_libs/opencl.cmake
@ -39,7 +39,8 @@ function(gene_opencl BASEPATH)
        if(NOT RESULT EQUAL "0")
            message(FATAL_ERROR "error! when generate ${inc_file_ex}")
        endif()
-        __exec_cmd(COMMAND sed -i "1i\\static const char *${kernel_name}_source =\\\"\\\\n\\\" \\\\" ${inc_file_ex} WORKING_DIRECTORY ${CL_SRC_DIR})
+        __exec_cmd(COMMAND sed -i "1i\\static const char *${kernel_name}_source =\\\"\\\\n\\\" \\\\"
+          ${inc_file_ex} WORKING_DIRECTORY ${CL_SRC_DIR})
        __exec_cmd(COMMAND sed -i "$a\\\\\;" ${inc_file_ex} WORKING_DIRECTORY ${CL_SRC_DIR})
    endforeach()
 endfunction()
--- a/cmake/external_libs/opencv.cmake
+++ b/cmake/external_libs/opencv.cmake
@ -8,74 +8,159 @@ elseif(${CMAKE_SYSTEM_NAME} MATCHES "Windows")
    set(opencv_CXXFLAGS "${opencv_CXXFLAGS} -Wno-attributes -Wno-unknown-pragmas")
    set(opencv_CXXFLAGS "${opencv_CXXFLAGS} -Wno-unused-value -Wno-implicit-fallthrough")
 else()
-    set(opencv_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -D_FORTIFY_SOURCE=2 -D_GLIBCXX_USE_CXX11_ABI=0 -O2")
+    set(opencv_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -D_FORTIFY_SOURCE=2")
+    set(opencv_CXXFLAGS "${opencv_CXXFLAGS} -O2")
+    if(NOT ENABLE_GLIBCXX)
+        set(opencv_CXXFLAGS "${opencv_CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
+    endif()
    set(opencv_CFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -D_FORTIFY_SOURCE=2 -O2")
    set(opencv_LDFLAGS "-Wl,-z,relro,-z,now,-z,noexecstack")
 endif()

 if(ENABLE_GITEE)
-    set(REQ_URL "https://gitee.com/mirrors/opencv/repository/archive/4.2.0.tar.gz")
-    set(MD5 "00424c7c4acde1e26ebf17aaa155bf23")
+    if(PYTHON_VERSION MATCHES "3.9")
+        set(REQ_URL "https://gitee.com/mirrors/opencv/repository/archive/4.5.1.tar.gz")
+        set(MD5 "e74309207f2fa88fb6cc417d8ea9ff09")
+    elseif((PYTHON_VERSION MATCHES "3.7") OR (PYTHON_VERSION MATCHES "3.8"))
+        set(REQ_URL "https://gitee.com/mirrors/opencv/repository/archive/4.2.0.tar.gz")
+        set(MD5 "00424c7c4acde1e26ebf17aaa155bf23")
+    else()
+        message("Could not find 'Python 3.8' or 'Python 3.7' or 'Python 3.9'")
+        return()
+    endif()
 else()
-    set(REQ_URL "https://github.com/opencv/opencv/archive/4.2.0.tar.gz")
-    set(MD5 "e8cb208ce2723481408b604b480183b6")
+    if(PYTHON_VERSION MATCHES "3.9")
+        set(REQ_URL "https://github.com/opencv/opencv/archive/4.5.1.tar.gz")
+        set(MD5 "2205d3169238ec1f184438a96de68513")
+    elseif((PYTHON_VERSION MATCHES "3.7") OR (PYTHON_VERSION MATCHES "3.8"))
+        set(REQ_URL "https://github.com/opencv/opencv/archive/4.2.0.tar.gz")
+        set(MD5 "e8cb208ce2723481408b604b480183b6")
+    else()
+        message("Could not find 'Python 3.8' or 'Python 3.7' or 'Python 3.9'")
+        return()
+    endif()
 endif()

 if(WIN32)
-    mindspore_add_pkg(opencv
-            VER 4.2.0
-            LIBS libopencv_core420.dll.a libopencv_imgcodecs420.dll.a libopencv_imgproc420.dll.a
-            LIB_PATH x64/mingw/lib
-            URL ${REQ_URL}
-            MD5 ${MD5}
-            CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DWITH_PROTOBUF=OFF -DWITH_WEBP=OFF -DWITH_IPP=OFF -DWITH_ADE=OFF
-            -DBUILD_ZLIB=ON
-            -DBUILD_JPEG=ON
-            -DBUILD_PNG=ON
-            -DBUILD_OPENEXR=ON
-            -DBUILD_TESTS=OFF
-            -DBUILD_PERF_TESTS=OFF
-            -DBUILD_opencv_apps=OFF
-            -DCMAKE_SKIP_RPATH=TRUE
-            -DBUILD_opencv_python3=OFF
-            -DBUILD_opencv_videoio=OFF
-            -DWITH_FFMPEG=OFF
-            -DWITH_TIFF=ON
-            -DBUILD_TIFF=OFF
-            -DWITH_JASPER=OFF
-            -DBUILD_JASPER=OFF
-            -DTIFF_INCLUDE_DIR=${tiff_INC}
-            -DTIFF_LIBRARY=${tiff_LIB})
+    if(PYTHON_VERSION MATCHES "3.9")
+        mindspore_add_pkg(opencv
+                VER 4.5.1
+                LIBS libopencv_core451.dll.a libopencv_imgcodecs451.dll.a libopencv_imgproc451.dll.a
+                LIB_PATH x64/mingw/lib
+                URL ${REQ_URL}
+                MD5 ${MD5}
+                CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DWITH_PROTOBUF=OFF -DWITH_WEBP=OFF -DWITH_IPP=OFF
+                -DWITH_ADE=OFF
+                -DBUILD_ZLIB=ON
+                -DBUILD_JPEG=ON
+                -DBUILD_PNG=ON
+                -DBUILD_OPENEXR=ON
+                -DBUILD_TESTS=OFF
+                -DBUILD_PERF_TESTS=OFF
+                -DBUILD_opencv_apps=OFF
+                -DCMAKE_SKIP_RPATH=TRUE
+                -DBUILD_opencv_python3=OFF
+                -DBUILD_opencv_videoio=OFF
+                -DWITH_FFMPEG=OFF
+                -DWITH_TIFF=ON
+                -DBUILD_TIFF=OFF
+                -DWITH_JASPER=OFF
+                -DBUILD_JASPER=OFF
+                -DTIFF_INCLUDE_DIR=${tiff_INC}
+                -DTIFF_LIBRARY=${tiff_LIB})
+    elseif(PYTHON_VERSION MATCHES "3.8" OR PYTHON_VERSION MATCHES "3.7")
+        mindspore_add_pkg(opencv
+                VER 4.2.0
+                LIBS libopencv_core420.dll.a libopencv_imgcodecs420.dll.a libopencv_imgproc420.dll.a
+                LIB_PATH x64/mingw/lib
+                URL ${REQ_URL}
+                MD5 ${MD5}
+                CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DWITH_PROTOBUF=OFF -DWITH_WEBP=OFF -DWITH_IPP=OFF
+                -DWITH_ADE=OFF
+                -DBUILD_ZLIB=ON
+                -DBUILD_JPEG=ON
+                -DBUILD_PNG=ON
+                -DBUILD_OPENEXR=ON
+                -DBUILD_TESTS=OFF
+                -DBUILD_PERF_TESTS=OFF
+                -DBUILD_opencv_apps=OFF
+                -DCMAKE_SKIP_RPATH=TRUE
+                -DBUILD_opencv_python3=OFF
+                -DBUILD_opencv_videoio=OFF
+                -DWITH_FFMPEG=OFF
+                -DWITH_TIFF=ON
+                -DBUILD_TIFF=OFF
+                -DWITH_JASPER=OFF
+                -DBUILD_JASPER=OFF
+                -DWITH_LAPACK=OFF
+                -DTIFF_INCLUDE_DIR=${tiff_INC}
+                -DTIFF_LIBRARY=${tiff_LIB})
+    endif()
 else()
-    mindspore_add_pkg(opencv
-            VER 4.2.0
-            LIBS opencv_core opencv_imgcodecs opencv_imgproc
-            URL ${REQ_URL}
-            MD5  ${MD5}
-            CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DWITH_PROTOBUF=OFF -DWITH_WEBP=OFF -DWITH_IPP=OFF -DWITH_ADE=OFF
-            -DBUILD_ZLIB=ON
-            -DBUILD_JPEG=ON
-            -DBUILD_PNG=ON
-            -DBUILD_OPENEXR=ON
-            -DBUILD_TESTS=OFF
-            -DBUILD_PERF_TESTS=OFF
-            -DBUILD_opencv_apps=OFF
-            -DCMAKE_SKIP_RPATH=TRUE
-            -DBUILD_opencv_python3=OFF
-            -DWITH_FFMPEG=OFF
-            -DWITH_TIFF=ON
-            -DBUILD_TIFF=OFF
-            -DWITH_JASPER=OFF
-            -DBUILD_JASPER=OFF
-            -DTIFF_INCLUDE_DIR=${tiff_INC}
-            -DTIFF_LIBRARY=${tiff_LIB})
+    if(PYTHON_VERSION MATCHES "3.9")
+        mindspore_add_pkg(opencv
+                VER 4.5.1
+                LIBS opencv_core opencv_imgcodecs opencv_imgproc
+                URL ${REQ_URL}
+                MD5  ${MD5}
+                CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DWITH_PROTOBUF=OFF -DWITH_WEBP=OFF -DWITH_IPP=OFF
+                -DWITH_ADE=OFF
+                -DBUILD_ZLIB=ON
+                -DBUILD_JPEG=ON
+                -DBUILD_PNG=ON
+                -DBUILD_OPENEXR=ON
+                -DBUILD_TESTS=OFF
+                -DBUILD_PERF_TESTS=OFF
+                -DBUILD_opencv_apps=OFF
+                -DCMAKE_SKIP_RPATH=TRUE
+                -DBUILD_opencv_python3=OFF
+                -DWITH_FFMPEG=OFF
+                -DWITH_TIFF=ON
+                -DBUILD_TIFF=OFF
+                -DWITH_JASPER=OFF
+                -DBUILD_JASPER=OFF
+                -DTIFF_INCLUDE_DIR=${tiff_INC}
+                -DTIFF_LIBRARY=${tiff_LIB})
+    elseif(PYTHON_VERSION MATCHES "3.8" OR PYTHON_VERSION MATCHES "3.7")
+        mindspore_add_pkg(opencv
+                VER 4.2.0
+                LIBS opencv_core opencv_imgcodecs opencv_imgproc
+                URL ${REQ_URL}
+                MD5  ${MD5}
+                CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DWITH_PROTOBUF=OFF -DWITH_WEBP=OFF -DWITH_IPP=OFF
+                -DWITH_ADE=OFF
+                -DBUILD_ZLIB=ON
+                -DBUILD_JPEG=ON
+                -DBUILD_PNG=ON
+                -DBUILD_OPENEXR=ON
+                -DBUILD_TESTS=OFF
+                -DBUILD_PERF_TESTS=OFF
+                -DBUILD_opencv_apps=OFF
+                -DCMAKE_SKIP_RPATH=TRUE
+                -DBUILD_opencv_python3=OFF
+                -DWITH_FFMPEG=OFF
+                -DWITH_TIFF=ON
+                -DBUILD_TIFF=OFF
+                -DWITH_JASPER=OFF
+                -DBUILD_JASPER=OFF
+                -DWITH_LAPACK=OFF
+                -DTIFF_INCLUDE_DIR=${tiff_INC}
+                -DTIFF_LIBRARY=${tiff_LIB})
+    endif()
 endif()

 if(WIN32)
-    include_directories(${opencv_INC})
-    add_library(mindspore::opencv_core ALIAS opencv::libopencv_core420.dll.a)
-    add_library(mindspore::opencv_imgcodecs ALIAS opencv::libopencv_imgcodecs420.dll.a)
-    add_library(mindspore::opencv_imgproc ALIAS opencv::libopencv_imgproc420.dll.a)
+    if(PYTHON_VERSION MATCHES "3.9")
+        include_directories(${opencv_INC})
+        add_library(mindspore::opencv_core ALIAS opencv::libopencv_core451.dll.a)
+        add_library(mindspore::opencv_imgcodecs ALIAS opencv::libopencv_imgcodecs451.dll.a)
+        add_library(mindspore::opencv_imgproc ALIAS opencv::libopencv_imgproc451.dll.a)
+    elseif(PYTHON_VERSION MATCHES "3.8" OR PYTHON_VERSION MATCHES "3.7")
+        include_directories(${opencv_INC})
+        add_library(mindspore::opencv_core ALIAS opencv::libopencv_core420.dll.a)
+        add_library(mindspore::opencv_imgcodecs ALIAS opencv::libopencv_imgcodecs420.dll.a)
+        add_library(mindspore::opencv_imgproc ALIAS opencv::libopencv_imgproc420.dll.a)
+    endif()
 else()
    include_directories(${opencv_INC}/opencv4)
    add_library(mindspore::opencv_core ALIAS opencv::opencv_core)
--- a/cmake/external_libs/projectq.cmake
+++ b/cmake/external_libs/projectq.cmake
@ -1,5 +1,5 @@
-set(projectq_CXXFLAGS "-fopenmp -O2 -ffast-mast -march=native -DINTRIN")
-set(projectq_CFLAGS "-fopenmp -O2 -ffast-mast -march=native -DINTRIN")
+set(projectq_CXXFLAGS "-fopenmp -O2 -ffast-mast -mavx -DINTRIN")
+set(projectq_CFLAGS "-fopenmp -O2 -ffast-mast -mavx -DINTRIN")

 if(ENABLE_GITEE)
    set(REQ_URL "https://gitee.com/mirrors/ProjectQ/repository/archive/v0.5.1.tar.gz")
--- a/cmake/external_libs/protobuf.cmake
+++ b/cmake/external_libs/protobuf.cmake
@ -1,13 +1,20 @@
 set(protobuf_USE_STATIC_LIBS ON)
 if(BUILD_LITE)
-    set(protobuf_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
+    set(protobuf_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter \
+        -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
 else()
    if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
-        set(protobuf_CXXFLAGS "-fstack-protector-all -Wno-uninitialized -Wno-unused-parameter -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
+        set(protobuf_CXXFLAGS "-fstack-protector-all -Wno-uninitialized -Wno-unused-parameter -fPIC \
+            -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
    elseif(${CMAKE_SYSTEM_NAME} MATCHES "Windows")
-        set(protobuf_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
+        set(protobuf_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter \
+            -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
    else()
-        set(protobuf_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -D_GLIBCXX_USE_CXX11_ABI=0 -O2")
+        set(protobuf_CXXFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter \
+            -fPIC -fvisibility=hidden -D_FORTIFY_SOURCE=2 -O2")
+        if(NOT ENABLE_GLIBCXX)
+            set(protobuf_CXXFLAGS "${protobuf_CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
+        endif()
    endif()
 endif()

@ -69,7 +76,6 @@ function(ms_protobuf_generate c_var h_var)
    set_source_files_properties(${${c_var}} ${${h_var}} PROPERTIES GENERATED TRUE)
    set(${c_var} ${${c_var}} PARENT_SCOPE)
    set(${h_var} ${${h_var}} PARENT_SCOPE)
-
 endfunction()

 function(ms_protobuf_generate_py c_var h_var py_var)
@ -100,8 +106,10 @@ function(ms_protobuf_generate_py c_var h_var py_var)
                    COMMAND protobuf::protoc -I${file_dir} --cpp_out=${CMAKE_BINARY_DIR}/proto ${abs_file}
                    COMMAND protobuf::protoc -I${file_dir} --python_out=${CMAKE_BINARY_DIR}/proto ${abs_file}
                    COMMAND protobuf::protoc -I${file_dir} --python_out=${CMAKE_BINARY_DIR}/proto ${abs_file}
-                    COMMAND perl -pi.bak -e "s/import (.+_pb2.*)/from . import \\1/"  "${CMAKE_BINARY_DIR}/proto/${file_name}_pb2.py"
-                    COMMAND ${CMAKE_COMMAND} -E copy "${CMAKE_BINARY_DIR}/proto/${file_name}_pb2.py" "${PROJECT_SOURCE_DIR}/mindspore/train/"
+                    COMMAND perl -pi.bak -e "s/import (.+_pb2.*)/from . import \\1/"
+                            "${CMAKE_BINARY_DIR}/proto/${file_name}_pb2.py"
+                    COMMAND ${CMAKE_COMMAND} -E copy "${CMAKE_BINARY_DIR}/proto/${file_name}_pb2.py"
+                            "${PROJECT_SOURCE_DIR}/mindspore/train/"
                    DEPENDS protobuf::protoc ${abs_file}
                    COMMENT "Running C++ protocol buffer compiler on ${file}" VERBATIM)
        else()
@ -114,7 +122,8 @@ function(ms_protobuf_generate_py c_var h_var py_var)
                    COMMAND protobuf::protoc -I${file_dir} --cpp_out=${CMAKE_BINARY_DIR}/proto ${abs_file}
                    COMMAND protobuf::protoc -I${file_dir} --python_out=${CMAKE_BINARY_DIR}/proto ${abs_file}
                    COMMAND protobuf::protoc -I${file_dir} --python_out=${CMAKE_BINARY_DIR}/proto ${abs_file}
-                    COMMAND perl -pi -e "s/import (.+_pb2.*)/from . import \\1/"  "${CMAKE_BINARY_DIR}/proto/${file_name}_pb2.py"
+                    COMMAND perl -pi -e "s/import (.+_pb2.*)/from . import \\1/"
+                            "${CMAKE_BINARY_DIR}/proto/${file_name}_pb2.py"
                    COMMAND cp "${CMAKE_BINARY_DIR}/proto/${file_name}_pb2.py" "${PROJECT_SOURCE_DIR}/mindspore/train/"
                    DEPENDS protobuf::protoc ${abs_file}
                    COMMENT "Running C++ protocol buffer compiler on ${file}" VERBATIM)
@ -124,5 +133,4 @@ function(ms_protobuf_generate_py c_var h_var py_var)
    set(${c_var} ${${c_var}} PARENT_SCOPE)
    set(${h_var} ${${h_var}} PARENT_SCOPE)
    set(${py_var} ${${py_var}} PARENT_SCOPE)
-
 endfunction()
--- a/cmake/external_libs/pybind11.cmake
+++ b/cmake/external_libs/pybind11.cmake
@ -1,36 +1,60 @@
 set(PYTHON_VERSION ${Python3_VERSION_MAJOR}.${Python3_VERSION_MINOR})

 if(ENABLE_GITEE)
-    if(PYTHON_VERSION MATCHES "3.8")
+    if(PYTHON_VERSION MATCHES "3.9")
+        set(REQ_URL "https://gitee.com/mirrors/pybind11/repository/archive/v2.6.1.tar.gz")
+        set(MD5 "a9b7642031f35daf33a75fe837b3dd31")
+    elseif(PYTHON_VERSION MATCHES "3.8")
        set(REQ_URL "https://gitee.com/mirrors/pybind11/repository/archive/v2.6.1.tar.gz")
        set(MD5 "a9b7642031f35daf33a75fe837b3dd31")
    elseif(PYTHON_VERSION MATCHES "3.7")
        set(REQ_URL "https://gitee.com/mirrors/pybind11/repository/archive/v2.4.3.tar.gz")
        set(MD5 "b473a37987ce456ea8cc7aab3f9486f9")
    else()
-        message("Could not find 'Python 3.8' or 'Python 3.7'")
+        message("Could not find 'Python 3.8' or 'Python 3.7' or 'Python 3.9'")
        return()
    endif()
 else()
-    if(PYTHON_VERSION MATCHES "3.8")
+    if(PYTHON_VERSION MATCHES "3.9")
+        set(REQ_URL "https://github.com/pybind/pybind11/archive/v2.6.1.tar.gz")
+        set(MD5 "32a7811f3db423df4ebfc731a28e5901")
+    elseif(PYTHON_VERSION MATCHES "3.8")
        set(REQ_URL "https://github.com/pybind/pybind11/archive/v2.6.1.tar.gz")
        set(MD5 "32a7811f3db423df4ebfc731a28e5901")
    elseif(PYTHON_VERSION MATCHES "3.7")
        set(REQ_URL "https://github.com/pybind/pybind11/archive/v2.4.3.tar.gz")
        set(MD5 "62254c40f89925bb894be421fe4cdef2")
    else()
-        message("Could not find 'Python 3.8' or 'Python 3.7'")
+        message("Could not find 'Python 3.8' or 'Python 3.7' or 'Python 3.9'")
        return()
    endif()
 endif()
 set(pybind11_CXXFLAGS "-D_FORTIFY_SOURCE=2 -O2")
 set(pybind11_CFLAGS "-D_FORTIFY_SOURCE=2 -O2")
-mindspore_add_pkg(pybind11
+
+if(PYTHON_VERSION MATCHES "3.9")
+    mindspore_add_pkg(pybind11
+        VER 2.6.1
+        URL ${REQ_URL}
+        MD5 ${MD5}
+        CMAKE_OPTION -DPYBIND11_TEST=OFF -DPYBIND11_LTO_CXX_FLAGS=FALSE
+        )
+elseif(PYTHON_VERSION MATCHES "3.8")
+    mindspore_add_pkg(pybind11
+        VER 2.6.1
+        URL ${REQ_URL}
+        MD5 ${MD5}
+        CMAKE_OPTION -DPYBIND11_TEST=OFF -DPYBIND11_LTO_CXX_FLAGS=FALSE
+        )
+else()
+    mindspore_add_pkg(pybind11
        VER 2.4.3
        URL ${REQ_URL}
        MD5 ${MD5}
        CMAKE_OPTION -DPYBIND11_TEST=OFF -DPYBIND11_LTO_CXX_FLAGS=FALSE
        )
+endif()
+
 include_directories(${pybind11_INC})
 find_package(pybind11 REQUIRED)
 set_property(TARGET pybind11::module PROPERTY IMPORTED_GLOBAL TRUE)
--- a/cmake/external_libs/sentencepiece.cmake
+++ b/cmake/external_libs/sentencepiece.cmake
@ -8,26 +8,40 @@ endif()


 if(WIN32)
-    set(sentencepiece_CXXFLAGS "-D_FORTIFY_SOURCE=2 -O2 -Wno-unused-result -Wno-stringop-overflow -Wno-format-extra-args -Wno-format")
+    set(sentencepiece_CXXFLAGS "-D_FORTIFY_SOURCE=2 -O2 -Wno-unused-result -Wno-stringop-overflow \
+        -Wno-format-extra-args -Wno-format")
    set(sentencepiece_CFLAGS "-D_FORTIFY_SOURCE=2 -O2")
    mindspore_add_pkg(sentencepiece
-            VER 0.1.92
-            LIBS sentencepiece sentencepiece_train
-            URL ${REQ_URL}
-            CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DSPM_USE_BUILTIN_PROTOBUF=ON
-            MD5 ${MD5}
-            )
+        VER 0.1.92
+        LIBS sentencepiece sentencepiece_train
+        URL ${REQ_URL}
+        CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DSPM_USE_BUILTIN_PROTOBUF=ON
+        MD5 ${MD5}
+        )
 else()
    set(sentencepiece_CXXFLAGS "-D_FORTIFY_SOURCE=2 -O2 -Wno-unused-result -Wno-sign-compare")
    set(sentencepiece_CFLAGS "-D_FORTIFY_SOURCE=2 -O2")
-    mindspore_add_pkg(sentencepiece
+    if(ENABLE_GLIBCXX)
+        mindspore_add_pkg(sentencepiece
            VER 0.1.92
            LIBS sentencepiece sentencepiece_train
            URL ${REQ_URL}
-            CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DSPM_USE_BUILTIN_PROTOBUF=OFF -DSPM_ENABLE_SHARED=OFF  -DPROTOBUF_INC=${protobuf_INC}
+            CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DSPM_USE_BUILTIN_PROTOBUF=OFF -DSPM_ENABLE_SHARED=OFF
+                -DPROTOBUF_INC=${protobuf_INC}
+            MD5 ${MD5}
+            PATCHES ${CMAKE_SOURCE_DIR}/third_party/patch/sentencepiece/sentencepiece.patch001_cpu
+            )
+    else()
+        mindspore_add_pkg(sentencepiece
+            VER 0.1.92
+            LIBS sentencepiece sentencepiece_train
+            URL ${REQ_URL}
+            CMAKE_OPTION -DCMAKE_BUILD_TYPE=Release -DSPM_USE_BUILTIN_PROTOBUF=OFF -DSPM_ENABLE_SHARED=OFF
+                -DPROTOBUF_INC=${protobuf_INC}
            MD5 ${MD5}
            PATCHES ${CMAKE_SOURCE_DIR}/third_party/patch/sentencepiece/sentencepiece.patch001
            )
+    endif()
 endif()
 include_directories(${sentencepiece_INC})
 add_library(mindspore::sentencepiece ALIAS sentencepiece::sentencepiece)
--- a/cmake/external_libs/sqlite.cmake
+++ b/cmake/external_libs/sqlite.cmake
@ -21,9 +21,11 @@ else()
    set(sqlite_USE_STATIC_LIBS ON)
    set(sqlite_CXXFLAGS)
    if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
-        set(sqlite_CFLAGS "-fstack-protector-all -Wno-uninitialized -Wno-unused-parameter -fPIC -D_FORTIFY_SOURCE=2 -O2")
+        set(sqlite_CFLAGS "-fstack-protector-all -Wno-uninitialized -Wno-unused-parameter -fPIC -D_FORTIFY_SOURCE=2 \
+          -O2")
    else()
-        set(sqlite_CFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -fPIC -D_FORTIFY_SOURCE=2 -O2")
+        set(sqlite_CFLAGS "-fstack-protector-all -Wno-maybe-uninitialized -Wno-unused-parameter -fPIC \
+          -D_FORTIFY_SOURCE=2 -O2")
        set(sqlite_LDFLAGS "-Wl,-z,relro,-z,now,-z,noexecstack")
    endif()
    mindspore_add_pkg(sqlite
--- a/cmake/external_libs/tvm_predict.cmake
+++ b/cmake/external_libs/tvm_predict.cmake
@ -2,7 +2,8 @@ if(ENABLE_GITEE)
    set(REQ_URL "https://gitee.com/mirrors/incubator-tvm/repository/archive/v0.6.0.tar.gz")
    set(MD5 "7b22965745cf1c6208a4e367fb86a585")
 else()
-    set(REQ_URL "https://github.com/apache/incubator-tvm/release/download/v0.6.0/apache-tvm-src-v0.6.0-incubating.tar.gz")
+    set(REQ_URL
+      "https://github.com/apache/incubator-tvm/release/download/v0.6.0/apache-tvm-src-v0.6.0-incubating.tar.gz")
    set(MD5 "2d77a005f0046d937b99c67de82f6438")
 endif()
 set(incubator_tvm_predict_CXXFLAGS "-D_FORTIFY_SOURCE=2 -O2")
--- a/cmake/options.cmake
+++ b/cmake/options.cmake
@ -21,6 +21,11 @@ option(ENABLE_DEBUGGER "enable debugger" OFF)
 option(ENABLE_IBVERBS "enable IBVERBS for parameter server" OFF)
 option(ENABLE_PYTHON "Enable python" ON)
 option(ENABLE_ACL "enable acl" OFF)
+option(ENABLE_GLIBCXX "enable_glibcxx" OFF)
+
+if(NOT ENABLE_D AND NOT ENABLE_TESTCASES AND NOT ENABLE_ACL)
+    set(ENABLE_GLIBCXX ON)
+endif()

 if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
    if(WIN32)
@ -40,14 +45,10 @@ if(ENABLE_COVERAGE)
 endif()

 if(ENABLE_ASAN)
-    if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
-        set(OPTION_CXX_FLAGS "${OPTION_CXX_FLAGS} -fsanitize=address -fsanitize-recover=address \
-        -fno-omit-frame-pointer -fsanitize=undefined")
-    else()
-        set(OPTION_CXX_FLAGS "${OPTION_CXX_FLAGS} -fsanitize=address -fno-omit-frame-pointer \
-        -static-libsan -fsanitize=undefined")
+    set(OPTION_CXX_FLAGS "${OPTION_CXX_FLAGS} -fsanitize=address -fsanitize-recover=address -fno-omit-frame-pointer")
+    if(NOT CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
+        set(OPTION_CXX_FLAGS "${OPTION_CXX_FLAGS} -static-libsan")
    endif()
-    set(OPTION_CXX_FLAGS "${OPTION_CXX_FLAGS} -mcmodel=medium")
 endif()

 if(DEBUG_MODE)
--- a/cmake/package.cmake
+++ b/cmake/package.cmake
@ -5,6 +5,7 @@ include(GNUInstallDirs)
 # set package information
 set(CPACK_PACKAGE_NAME ${PROJECT_NAME})
 set(CPACK_GENERATOR "External")
+set(CPACK_CMAKE_GENERATOR "Ninja")
 set(CPACK_EXTERNAL_PACKAGE_SCRIPT ${CMAKE_SOURCE_DIR}/cmake/package_script.cmake)
 set(CPACK_EXTERNAL_ENABLE_STAGING true)
 set(CPACK_TEMPORARY_PACKAGE_FILE_NAME ${CMAKE_SOURCE_DIR}/build/package/mindspore)
@ -76,7 +77,7 @@ install(
 )

 if(USE_GLOG)
-    file(GLOB_RECURSE GLOG_LIB_LIST ${glog_LIBPATH}/libglog*)
+    file(GLOB_RECURSE GLOG_LIB_LIST ${glog_LIBPATH}/libmindspore_glog*)
    install(
        FILES ${GLOG_LIB_LIST}
        DESTINATION ${INSTALL_LIB_DIR}
--- a/cmake/package_lite.cmake
+++ b/cmake/package_lite.cmake
@ -4,8 +4,6 @@ set(RUNTIME_PKG_NAME ${MAIN_DIR}-${RUNTIME_COMPONENT_NAME})

 set(CODEGEN_ROOT_DIR ${RUNTIME_PKG_NAME}/tools/codegen)
 set(CONVERTER_ROOT_DIR ${RUNTIME_PKG_NAME}/tools/converter)
-set(BENCHMARK_ROOT_DIR ${RUNTIME_PKG_NAME}/tools/benchmark)
-set(BENCHMARK_TRAIN_ROOT_DIR ${RUNTIME_PKG_NAME}/tools/benchmark_train)
 set(CROPPER_ROOT_DIR ${RUNTIME_PKG_NAME}/tools/cropper)

 if(SUPPORT_TRAIN)
@ -15,6 +13,9 @@ if(SUPPORT_TRAIN)
    set(MIND_DATA_INC_DIR ${RUNTIME_PKG_NAME}/train/minddata/include)
    set(MIND_DATA_LIB_DIR ${RUNTIME_PKG_NAME}/train/minddata/lib)
    set(TURBO_DIR ${RUNTIME_PKG_NAME}/train/minddata/third_party/libjpeg-turbo)
+    set(MINDSPORE_LITE_LIB_NAME libmindspore-lite-train)
+    set(BENCHMARK_NAME benchmark_train)
+    set(BENCHMARK_ROOT_DIR ${RUNTIME_PKG_NAME}/tools/benchmark_train)
 else()
    set(RUNTIME_DIR ${RUNTIME_PKG_NAME}/inference)
    set(RUNTIME_INC_DIR ${RUNTIME_PKG_NAME}/inference/include)
@ -22,6 +23,9 @@ else()
    set(MIND_DATA_INC_DIR ${RUNTIME_PKG_NAME}/inference/minddata/include)
    set(MIND_DATA_LIB_DIR ${RUNTIME_PKG_NAME}/inference/minddata/lib)
    set(TURBO_DIR ${RUNTIME_PKG_NAME}/inference/minddata/third_party/libjpeg-turbo)
+    set(MINDSPORE_LITE_LIB_NAME libmindspore-lite)
+    set(BENCHMARK_NAME benchmark)
+    set(BENCHMARK_ROOT_DIR ${RUNTIME_PKG_NAME}/tools/benchmark)
 endif()

 if(BUILD_MINDDATA STREQUAL "full")
@ -141,22 +145,29 @@ if(PLATFORM_ARM64)
        install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${RUNTIME_INC_DIR}
                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "train*" EXCLUDE)
    endif()
-    install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.so DESTINATION ${RUNTIME_LIB_DIR}
+    install(FILES ${TOP_DIR}/mindspore/lite/build/src/${MINDSPORE_LITE_LIB_NAME}.so DESTINATION ${RUNTIME_LIB_DIR}
            COMPONENT ${RUNTIME_COMPONENT_NAME})
-    install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.a DESTINATION ${RUNTIME_LIB_DIR}
+    install(FILES ${TOP_DIR}/mindspore/lite/build/src/${MINDSPORE_LITE_LIB_NAME}.a DESTINATION ${RUNTIME_LIB_DIR}
            COMPONENT ${RUNTIME_COMPONENT_NAME})
    install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${RUNTIME_INC_DIR}/ir/dtype
            COMPONENT ${RUNTIME_COMPONENT_NAME})
    install(DIRECTORY ${TOP_DIR}/include/api/ DESTINATION ${RUNTIME_INC_DIR}/api
-            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "ascend* ops*" EXCLUDE)
-    install(DIRECTORY ${TOP_DIR}/mindspore/lite/build/operator_library DESTINATION ${CODEGEN_ROOT_DIR}
-            COMPONENT ${RUNTIME_COMPONENT_NAME})
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "ops*" EXCLUDE)
+    file(GLOB NNACL_FILES GLOB ${TOP_DIR}/mindspore/lite/nnacl/*.h)
+    install(FILES ${NNACL_FILES} DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl COMPONENT ${RUNTIME_COMPONENT_NAME})
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/base DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/int8 DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/fp32 DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/intrinsics DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/micro/coder/wrapper DESTINATION ${CODEGEN_ROOT_DIR}/include
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(TARGETS wrapper ARCHIVE DESTINATION ${CODEGEN_ROOT_DIR}/lib COMPONENT ${RUNTIME_COMPONENT_NAME})
    if(ENABLE_TOOLS)
-        install(TARGETS benchmark RUNTIME DESTINATION ${BENCHMARK_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
-        if(SUPPORT_TRAIN)
-            install(TARGETS benchmark_train RUNTIME DESTINATION ${BENCHMARK_TRAIN_ROOT_DIR}
-                    COMPONENT ${RUNTIME_COMPONENT_NAME})
-        endif()
+        install(TARGETS ${BENCHMARK_NAME} RUNTIME DESTINATION ${BENCHMARK_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
    endif()
 elseif(PLATFORM_ARM32)
    if(SUPPORT_TRAIN)
@ -166,22 +177,29 @@ elseif(PLATFORM_ARM32)
        install(DIRECTORY ${TOP_DIR}/mindspore/lite/include/ DESTINATION ${RUNTIME_INC_DIR}
                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "train*" EXCLUDE)
    endif()
-    install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.so DESTINATION ${RUNTIME_LIB_DIR}
+    install(FILES ${TOP_DIR}/mindspore/lite/build/src/${MINDSPORE_LITE_LIB_NAME}.so DESTINATION ${RUNTIME_LIB_DIR}
            COMPONENT ${RUNTIME_COMPONENT_NAME})
-    install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.a DESTINATION ${RUNTIME_LIB_DIR}
+    install(FILES ${TOP_DIR}/mindspore/lite/build/src/${MINDSPORE_LITE_LIB_NAME}.a DESTINATION ${RUNTIME_LIB_DIR}
            COMPONENT ${RUNTIME_COMPONENT_NAME})
    install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${RUNTIME_INC_DIR}/ir/dtype
            COMPONENT ${RUNTIME_COMPONENT_NAME})
    install(DIRECTORY ${TOP_DIR}/include/api/ DESTINATION ${RUNTIME_INC_DIR}/api
-            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "ascend*" EXCLUDE)
-    install(DIRECTORY ${TOP_DIR}/mindspore/lite/build/operator_library DESTINATION ${CODEGEN_ROOT_DIR}
-            COMPONENT ${RUNTIME_COMPONENT_NAME})
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "ops*" EXCLUDE)
+    file(GLOB NNACL_FILES GLOB ${TOP_DIR}/mindspore/lite/nnacl/*.h)
+    install(FILES ${NNACL_FILES} DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl COMPONENT ${RUNTIME_COMPONENT_NAME})
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/base DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/int8 DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/fp32 DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/intrinsics DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(DIRECTORY ${TOP_DIR}/mindspore/lite/micro/coder/wrapper DESTINATION ${CODEGEN_ROOT_DIR}/include
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+    install(TARGETS wrapper ARCHIVE DESTINATION ${CODEGEN_ROOT_DIR}/lib COMPONENT ${RUNTIME_COMPONENT_NAME})
    if(ENABLE_TOOLS)
-        install(TARGETS benchmark RUNTIME DESTINATION ${BENCHMARK_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
-        if(SUPPORT_TRAIN)
-            install(TARGETS benchmark_train RUNTIME DESTINATION ${BENCHMARK_TRAIN_ROOT_DIR}
-                    COMPONENT ${RUNTIME_COMPONENT_NAME})
-        endif()
+        install(TARGETS ${BENCHMARK_NAME} RUNTIME DESTINATION ${BENCHMARK_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
    endif()
 elseif(WIN32)
    get_filename_component(CXX_DIR ${CMAKE_CXX_COMPILER} PATH)
@ -198,7 +216,7 @@ elseif(WIN32)
        install(TARGETS codegen RUNTIME DESTINATION ${CODEGEN_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
    endif()
    if(ENABLE_TOOLS)
-        install(TARGETS benchmark RUNTIME DESTINATION ${BENCHMARK_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
+        install(TARGETS ${BENCHMARK_NAME} RUNTIME DESTINATION ${BENCHMARK_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
    endif()
    install(FILES ${LIB_LIST} DESTINATION ${RUNTIME_LIB_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
    install(DIRECTORY ${flatbuffers_INC} DESTINATION ${RUNTIME_INC_DIR}/third_party/
@ -213,12 +231,12 @@ elseif(WIN32)
    install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${RUNTIME_INC_DIR}/ir/dtype
            COMPONENT ${RUNTIME_COMPONENT_NAME})
    install(DIRECTORY ${TOP_DIR}/include/api/ DESTINATION ${RUNTIME_INC_DIR}/api
-            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "ascend*" EXCLUDE)
-    install(FILES ${TOP_DIR}/build/mindspore/src/libmindspore-lite.a DESTINATION ${RUNTIME_LIB_DIR}
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "ops*" EXCLUDE)
+    install(FILES ${TOP_DIR}/build/mindspore/src/${MINDSPORE_LITE_LIB_NAME}.a DESTINATION ${RUNTIME_LIB_DIR}
            COMPONENT ${RUNTIME_COMPONENT_NAME})
-    install(FILES ${TOP_DIR}/build/mindspore/src/libmindspore-lite.dll.a DESTINATION ${RUNTIME_LIB_DIR}
+    install(FILES ${TOP_DIR}/build/mindspore/src/${MINDSPORE_LITE_LIB_NAME}.dll.a DESTINATION ${RUNTIME_LIB_DIR}
            COMPONENT ${RUNTIME_COMPONENT_NAME})
-    install(FILES ${TOP_DIR}/build/mindspore/src/libmindspore-lite.dll DESTINATION ${RUNTIME_LIB_DIR}
+    install(FILES ${TOP_DIR}/build/mindspore/src/${MINDSPORE_LITE_LIB_NAME}.dll DESTINATION ${RUNTIME_LIB_DIR}
            COMPONENT ${RUNTIME_COMPONENT_NAME})
 else()
    if(SUPPORT_TRAIN)
@ -231,10 +249,10 @@ else()
    install(FILES ${TOP_DIR}/mindspore/core/ir/dtype/type_id.h DESTINATION ${RUNTIME_INC_DIR}/ir/dtype
            COMPONENT ${RUNTIME_COMPONENT_NAME})
    install(DIRECTORY ${TOP_DIR}/include/api/ DESTINATION ${RUNTIME_INC_DIR}/api
-            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "ascend*" EXCLUDE)
-    install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.so DESTINATION ${RUNTIME_LIB_DIR}
+            COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h" PATTERN "ops*" EXCLUDE)
+    install(FILES ${TOP_DIR}/mindspore/lite/build/src/${MINDSPORE_LITE_LIB_NAME}.so DESTINATION ${RUNTIME_LIB_DIR}
            COMPONENT ${RUNTIME_COMPONENT_NAME})
-    install(FILES ${TOP_DIR}/mindspore/lite/build/src/libmindspore-lite.a DESTINATION ${RUNTIME_LIB_DIR}
+    install(FILES ${TOP_DIR}/mindspore/lite/build/src/${MINDSPORE_LITE_LIB_NAME}.a DESTINATION ${RUNTIME_LIB_DIR}
            COMPONENT ${RUNTIME_COMPONENT_NAME})
    if(ENABLE_CONVERTER)
        install(TARGETS converter_lite RUNTIME DESTINATION ${CONVERTER_ROOT_DIR}/converter
@ -244,16 +262,32 @@ else()
        install(FILES ${glog_LIBPATH}/libglog.so.0.4.0
                DESTINATION ${CONVERTER_ROOT_DIR}/third_party/glog/lib RENAME libglog.so.0
                COMPONENT ${RUNTIME_COMPONENT_NAME})
-        install(DIRECTORY ${TOP_DIR}/mindspore/lite/build/operator_library DESTINATION ${CODEGEN_ROOT_DIR}
+        file(GLOB NNACL_FILES GLOB ${TOP_DIR}/mindspore/lite/nnacl/*.h)
+        install(FILES ${NNACL_FILES} DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl COMPONENT ${RUNTIME_COMPONENT_NAME})
+        install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/base DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+        install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/int8 DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+        install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/fp32 DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+        install(DIRECTORY ${TOP_DIR}/mindspore/lite/nnacl/intrinsics DESTINATION ${CODEGEN_ROOT_DIR}/include/nnacl
+                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+        install(DIRECTORY ${TOP_DIR}/mindspore/lite/micro/coder/wrapper DESTINATION ${CODEGEN_ROOT_DIR}/include
+                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+        install(TARGETS wrapper ARCHIVE DESTINATION ${CODEGEN_ROOT_DIR}/lib COMPONENT ${RUNTIME_COMPONENT_NAME})
+        set(MICRO_CMSIS_DIR ${CMAKE_BINARY_DIR}/cmsis/CMSIS)
+        install(DIRECTORY ${MICRO_CMSIS_DIR}/Core/Include DESTINATION ${CODEGEN_ROOT_DIR}/third_party/include/CMSIS/Core
+                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+        install(DIRECTORY ${MICRO_CMSIS_DIR}/DSP/Include DESTINATION ${CODEGEN_ROOT_DIR}/third_party/include/CMSIS/DSP
+                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+        install(DIRECTORY ${MICRO_CMSIS_DIR}/NN/Include DESTINATION ${CODEGEN_ROOT_DIR}/third_party/include/CMSIS/NN
+                COMPONENT ${RUNTIME_COMPONENT_NAME} FILES_MATCHING PATTERN "*.h")
+        install(TARGETS cmsis_nn ARCHIVE DESTINATION ${CODEGEN_ROOT_DIR}/third_party/lib
                COMPONENT ${RUNTIME_COMPONENT_NAME})
        install(TARGETS codegen RUNTIME DESTINATION ${CODEGEN_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
    endif()
    if(ENABLE_TOOLS)
-        install(TARGETS benchmark RUNTIME DESTINATION ${BENCHMARK_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
-        if(SUPPORT_TRAIN)
-            install(TARGETS benchmark_train RUNTIME DESTINATION ${BENCHMARK_TRAIN_ROOT_DIR}
-                    COMPONENT ${RUNTIME_COMPONENT_NAME})
-        endif()
+        install(TARGETS ${BENCHMARK_NAME} RUNTIME DESTINATION ${BENCHMARK_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
        install(TARGETS cropper RUNTIME DESTINATION ${CROPPER_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
        install(FILES ${TOP_DIR}/mindspore/lite/build/tools/cropper/cropper_mapping_cpu.cfg
                DESTINATION ${CROPPER_ROOT_DIR} COMPONENT ${RUNTIME_COMPONENT_NAME})
--- a/cmake/package_script.cmake
+++ b/cmake/package_script.cmake
@ -1,5 +1,5 @@
 # find exec
-find_package(Python3 3.7 COMPONENTS Interpreter)
+find_package(Python3 COMPONENTS Interpreter)
 if(NOT Python3_FOUND)
    message(FATAL_ERROR "No python3 found.")
 endif()
@ -7,8 +7,8 @@ endif()
 set(PYTHON ${Python3_EXECUTABLE})
 set(PYTHON_VERSION ${Python3_VERSION_MAJOR}.${Python3_VERSION_MINOR})

-if(NOT (PYTHON_VERSION MATCHES "3.8" OR PYTHON_VERSION MATCHES "3.7"))
-    message(FATAL_ERROR "FIND PYTHON VERSION ${PYTHON_VERSION} BUT CAN NOT MATCH PYTHON VERSION 3.8 OR 3.7")
+if(NOT (PYTHON_VERSION MATCHES "3.9" OR PYTHON_VERSION MATCHES "3.8" OR PYTHON_VERSION MATCHES "3.7"))
+    message(FATAL_ERROR "FIND PYTHON VERSION ${PYTHON_VERSION} BUT CAN NOT MATCH PYTHON VERSION 3.9 OR 3.8 OR 3.7")
 endif()

 find_package(Git)
@ -24,32 +24,38 @@ set(MS_PACK_ROOT_DIR ${MS_ROOT_DIR}/build/package)

 # set package file name
 if(CMAKE_SYSTEM_NAME MATCHES "Linux")
-    if(PYTHON_VERSION MATCHES "3.8")
+    if(PYTHON_VERSION MATCHES "3.9")
+        set(PY_TAGS "cp39-cp39")
+    elseif(PYTHON_VERSION MATCHES "3.8")
        set(PY_TAGS "cp38-cp38")
    elseif(PYTHON_VERSION MATCHES "3.7")
        set(PY_TAGS "cp37-cp37m")
    else()
-        message("Could not find 'Python 3.8' or 'Python 3.7'")
+        message("Could not find 'Python 3.9' OR 'Python 3.8' or 'Python 3.7'")
        return()
    endif()
    string(TOLOWER linux_${CMAKE_HOST_SYSTEM_PROCESSOR} PLATFORM_TAG)
 elseif(CMAKE_SYSTEM_NAME MATCHES "Darwin")
-    if(PYTHON_VERSION MATCHES "3.8")
+    if(PYTHON_VERSION MATCHES "3.9")
+        set(PY_TAGS "py39-none")
+    elseif(PYTHON_VERSION MATCHES "3.8")
        set(PY_TAGS "py38-none")
    elseif(PYTHON_VERSION MATCHES "3.7")
        set(PY_TAGS "py37-none")
    else()
-        message("Could not find 'Python 3.8' or 'Python 3.7'")
+        message("Could not find 'Python 3.9' OR 'Python 3.8' or 'Python 3.7'")
        return()
    endif()
    set(PLATFORM_TAG "any")
 elseif(CMAKE_SYSTEM_NAME MATCHES "Windows")
-    if(PYTHON_VERSION MATCHES "3.8")
+    if(PYTHON_VERSION MATCHES "3.9")
+        set(PY_TAGS "cp39-cp39")
+    elseif(PYTHON_VERSION MATCHES "3.8")
        set(PY_TAGS "cp38-cp38")
    elseif(PYTHON_VERSION MATCHES "3.7")
        set(PY_TAGS "cp37-cp37m")
    else()
-        message("Could not find 'Python 3.8' or 'Python 3.7'")
+        message("Could not find 'Python 3.9' OR 'Python 3.8' or 'Python 3.7'")
        return()
    endif()
    set(PLATFORM_TAG "win_amd64")
--- a/config/op_info.config
+++ b/config/op_info.config
--- a/2
+++ b/2
@ -1 +1 @@
-Subproject commit 40e5c42a12c4daa1530e8db9d006d5b3be5b378f
+Subproject commit 8770bfcdd73777207d562597e21c63179af598f2
--- a/include/api/cell.h
+++ b/include/api/cell.h
@ -103,8 +103,9 @@ class MS_API GraphCell final : public Cell<GraphCell> {
  std::vector<MSTensor> GetOutputs();

 private:
+  friend class Model;
  friend class ModelImpl;
-  Status Load();
+  Status Load(uint32_t device_id);

  std::shared_ptr<Graph> graph_;
  std::shared_ptr<GraphImpl> executor_;
--- a/include/api/context.h
+++ b/include/api/context.h
@ -24,162 +24,219 @@
 #include "include/api/dual_abi_helper.h"

 namespace mindspore {
-constexpr auto kDeviceTypeAscend310 = "Ascend310";
-constexpr auto kDeviceTypeAscend910 = "Ascend910";
-constexpr auto kDeviceTypeGPU = "GPU";
+enum DeviceType {
+  kCPU = 0,
+  kMaliGPU,
+  kNvidiaGPU,
+  kKirinNPU,
+  kAscend910,
+  kAscend310,
+  // add new type here
+  kInvalidDeviceType = 100,
+};

-struct MS_API Context {
+class Allocator;
+class DeviceInfoContext;
+
+class MS_API Context {
 public:
  Context();
-  virtual ~Context() = default;
+  ~Context() = default;
+
+  void SetThreadNum(int32_t thread_num);
+  int32_t GetThreadNum() const;
+
+  void SetAllocator(const std::shared_ptr<Allocator> &allocator);
+  std::shared_ptr<Allocator> GetAllocator() const;
+
+  std::vector<std::shared_ptr<DeviceInfoContext>> &MutableDeviceInfo();
+
+ private:
  struct Data;
-  std::shared_ptr<Data> data;
+  std::shared_ptr<Data> data_;
 };

-struct MS_API GlobalContext : public Context {
+class MS_API DeviceInfoContext : public std::enable_shared_from_this<DeviceInfoContext> {
 public:
-  static std::shared_ptr<Context> GetGlobalContext();
+  struct Data;

-  static inline void SetGlobalDeviceTarget(const std::string &device_target);
-  static inline std::string GetGlobalDeviceTarget();
+  DeviceInfoContext();
+  virtual ~DeviceInfoContext() = default;
+  virtual enum DeviceType GetDeviceType() const = 0;

-  static void SetGlobalDeviceID(const uint32_t &device_id);
-  static uint32_t GetGlobalDeviceID();
+  template <class T>
+  std::shared_ptr<T> Cast() {
+    static_assert(std::is_base_of<DeviceInfoContext, T>::value, "Wrong cast type.");
+    if (GetDeviceType() != T().GetDeviceType()) {
+      return nullptr;
+    }

-  static inline void SetGlobalDumpConfigPath(const std::string &cfg_path);
-  static inline std::string GetGlobalDumpConfigPath();
+    return std::static_pointer_cast<T>(shared_from_this());
+  }
+
+ protected:
+  std::shared_ptr<Data> data_;
+};
+
+class MS_API CPUDeviceInfo : public DeviceInfoContext {
+ public:
+  enum DeviceType GetDeviceType() const override { return DeviceType::kCPU; };
+
+  /// \brief Set the thread affinity to CPU cores.
+  ///
+  /// \param mode: 0: no affinities, 1: big cores first, 2: little cores first
+  void SetThreadAffinity(int mode);
+  int GetThreadAffinity() const;
+  void SetEnableFP16(bool is_fp16);
+  bool GetEnableFP16() const;
+};
+
+class MS_API MaliGPUDeviceInfo : public DeviceInfoContext {
+ public:
+  enum DeviceType GetDeviceType() const override { return DeviceType::kMaliGPU; };
+
+  void SetEnableFP16(bool is_fp16);
+  bool GetEnableFP16() const;
+};
+
+class MS_API KirinNPUDeviceInfo : public DeviceInfoContext {
+ public:
+  enum DeviceType GetDeviceType() const override { return DeviceType::kKirinNPU; };
+
+  void SetFrequency(int frequency);
+  int GetFrequency() const;
+};
+
+class MS_API NvidiaGPUDeviceInfo : public DeviceInfoContext {
+ public:
+  enum DeviceType GetDeviceType() const override { return DeviceType::kNvidiaGPU; };
+
+  void SetDeviceID(uint32_t device_id);
+  uint32_t GetDeviceID() const;
+
+  void SetGpuTrtInferMode(bool gpu_trt_infer_mode);
+  bool GetGpuTrtInferMode() const;
+};
+
+class MS_API Ascend910DeviceInfo : public DeviceInfoContext {
+ public:
+  enum DeviceType GetDeviceType() const override { return DeviceType::kAscend910; };
+
+  void SetDeviceID(uint32_t device_id);
+  uint32_t GetDeviceID() const;
+};
+
+class MS_API Ascend310DeviceInfo : public DeviceInfoContext {
+ public:
+  enum DeviceType GetDeviceType() const override { return DeviceType::kAscend310; };
+
+  void SetDeviceID(uint32_t device_id);
+  uint32_t GetDeviceID() const;
+
+  inline void SetDumpConfigPath(const std::string &cfg_path);
+  inline std::string GetDumpConfigPath() const;
+
+  // aipp config file
+  inline void SetInsertOpConfigPath(const std::string &cfg_path);
+  inline std::string GetInsertOpConfigPath() const;
+
+  // nchw or nhwc
+  inline void SetInputFormat(const std::string &format);
+  inline std::string GetInputFormat() const;
+
+  // Mandatory while dynamic batch: e.g. "input_op_name1: 1,2,3,4;input_op_name2: 4,3,2,1"
+  inline void SetInputShape(const std::string &shape);
+  inline std::string GetInputShape() const;
+
+  void SetInputShapeMap(const std::map<int, std::vector<int>> &shape);
+  std::map<int, std::vector<int>> GetInputShapeMap() const;
+
+  void SetDynamicBatchSize(const std::vector<size_t> &dynamic_batch_size);
+  inline std::string GetDynamicBatchSize() const;
+
+  // FP32, UINT8 or FP16, default as FP32
+  void SetOutputType(enum DataType output_type);
+  enum DataType GetOutputType() const;
+
+  // "force_fp16", "allow_fp32_to_fp16", "must_keep_origin_dtype" or "allow_mix_precision", default as "force_fp16"
+  inline void SetPrecisionMode(const std::string &precision_mode);
+  inline std::string GetPrecisionMode() const;
+
+  // Optional "high_performance" and "high_precision", "high_performance" is set as default
+  inline void SetOpSelectImplMode(const std::string &op_select_impl_mode);
+  inline std::string GetOpSelectImplMode() const;
+
+  inline void SetFusionSwitchConfigPath(const std::string &cfg_path);
+  inline std::string GetFusionSwitchConfigPath() const;
+
+  // Optional "l1_optimize", "l2_optimize", "off_optimize" or "l1_and_l2_optimize", default as "l2_optimize"
+  inline void SetBufferOptimizeMode(const std::string &buffer_optimize_mode);
+  inline std::string GetBufferOptimizeMode() const;

 private:
-  // api without std::string
-  static void SetGlobalDeviceTarget(const std::vector<char> &device_target);
-  static std::vector<char> GetGlobalDeviceTargetChar();
+  void SetDumpConfigPath(const std::vector<char> &cfg_path);
+  std::vector<char> GetDumpConfigPathChar() const;

-  static void SetGlobalDumpConfigPath(const std::vector<char> &cfg_path);
-  static std::vector<char> GetGlobalDumpConfigPathChar();
+  void SetInsertOpConfigPath(const std::vector<char> &cfg_path);
+  std::vector<char> GetInsertOpConfigPathChar() const;
+
+  void SetInputFormat(const std::vector<char> &format);
+  std::vector<char> GetInputFormatChar() const;
+
+  void SetInputShape(const std::vector<char> &shape);
+  std::vector<char> GetInputShapeChar() const;
+
+  std::vector<char> GetDynamicBatchSizeChar() const;
+
+  void SetPrecisionMode(const std::vector<char> &precision_mode);
+  std::vector<char> GetPrecisionModeChar() const;
+
+  void SetOpSelectImplMode(const std::vector<char> &op_select_impl_mode);
+  std::vector<char> GetOpSelectImplModeChar() const;
+
+  void SetFusionSwitchConfigPath(const std::vector<char> &cfg_path);
+  std::vector<char> GetFusionSwitchConfigPathChar() const;
+
+  void SetBufferOptimizeMode(const std::vector<char> &buffer_optimize_mode);
+  std::vector<char> GetBufferOptimizeModeChar() const;
 };

-struct MS_API ModelContext : public Context {
- public:
-  static inline void SetInsertOpConfigPath(const std::shared_ptr<Context> &context, const std::string &cfg_path);
-  static inline std::string GetInsertOpConfigPath(const std::shared_ptr<Context> &context);
+void Ascend310DeviceInfo::SetDumpConfigPath(const std::string &cfg_path) { SetDumpConfigPath(StringToChar(cfg_path)); }
+std::string Ascend310DeviceInfo::GetDumpConfigPath() const { return CharToString(GetDumpConfigPathChar()); }

-  static inline void SetInputFormat(const std::shared_ptr<Context> &context, const std::string &format);
-  static inline std::string GetInputFormat(const std::shared_ptr<Context> &context);
-
-  static inline void SetInputShape(const std::shared_ptr<Context> &context, const std::string &shape);
-  static inline std::string GetInputShape(const std::shared_ptr<Context> &context);
-
-  static void SetInputShapeMap(const std::shared_ptr<Context> &context, const std::map<int, std::vector<int>> &shape);
-  static std::map<int, std::vector<int>> GetInputShapeMap(const std::shared_ptr<Context> &context);
-
-  static void SetDynamicBatchSize(const std::shared_ptr<Context> &context,
-                                  const std::vector<size_t> &dynamic_batch_size);
-  static inline std::string GetDynamicBatchSize(const std::shared_ptr<Context> &context);
-
-  static void SetOutputType(const std::shared_ptr<Context> &context, enum DataType output_type);
-  static enum DataType GetOutputType(const std::shared_ptr<Context> &context);
-
-  static inline void SetPrecisionMode(const std::shared_ptr<Context> &context, const std::string &precision_mode);
-  static inline std::string GetPrecisionMode(const std::shared_ptr<Context> &context);
-
-  static inline void SetOpSelectImplMode(const std::shared_ptr<Context> &context,
-                                         const std::string &op_select_impl_mode);
-  static inline std::string GetOpSelectImplMode(const std::shared_ptr<Context> &context);
-
-  static inline void SetFusionSwitchConfigPath(const std::shared_ptr<Context> &context, const std::string &cfg_path);
-  static inline std::string GetFusionSwitchConfigPath(const std::shared_ptr<Context> &context);
-
-  static inline void SetGpuTrtInferMode(const std::shared_ptr<Context> &context, const std::string &gpu_trt_infer_mode);
-  static inline std::string GetGpuTrtInferMode(const std::shared_ptr<Context> &context);
-
- private:
-  // api without std::string
-  static void SetInsertOpConfigPath(const std::shared_ptr<Context> &context, const std::vector<char> &cfg_path);
-  static std::vector<char> GetInsertOpConfigPathChar(const std::shared_ptr<Context> &context);
-
-  static void SetInputFormat(const std::shared_ptr<Context> &context, const std::vector<char> &format);
-  static std::vector<char> GetInputFormatChar(const std::shared_ptr<Context> &context);
-
-  static void SetInputShape(const std::shared_ptr<Context> &context, const std::vector<char> &shape);
-  static std::vector<char> GetInputShapeChar(const std::shared_ptr<Context> &context);
-
-  static void SetPrecisionMode(const std::shared_ptr<Context> &context, const std::vector<char> &precision_mode);
-  static std::vector<char> GetPrecisionModeChar(const std::shared_ptr<Context> &context);
-
-  static void SetOpSelectImplMode(const std::shared_ptr<Context> &context,
-                                  const std::vector<char> &op_select_impl_mode);
-  static std::vector<char> GetOpSelectImplModeChar(const std::shared_ptr<Context> &context);
-
-  static void SetFusionSwitchConfigPath(const std::shared_ptr<Context> &context, const std::vector<char> &cfg_path);
-  static std::vector<char> GetFusionSwitchConfigPathChar(const std::shared_ptr<Context> &context);
-
-  static void SetGpuTrtInferMode(const std::shared_ptr<Context> &context, const std::vector<char> &gpu_trt_infer_mode);
-  static std::vector<char> GetGpuTrtInferModeChar(const std::shared_ptr<Context> &context);
-  static std::vector<char> GetDynamicBatchSizeChar(const std::shared_ptr<Context> &context);
-};
-
-void GlobalContext::SetGlobalDeviceTarget(const std::string &device_target) {
-  SetGlobalDeviceTarget(StringToChar(device_target));
+void Ascend310DeviceInfo::SetInsertOpConfigPath(const std::string &cfg_path) {
+  SetInsertOpConfigPath(StringToChar(cfg_path));
 }
-std::string GlobalContext::GetGlobalDeviceTarget() { return CharToString(GetGlobalDeviceTargetChar()); }
+std::string Ascend310DeviceInfo::GetInsertOpConfigPath() const { return CharToString(GetInsertOpConfigPathChar()); }

-void GlobalContext::SetGlobalDumpConfigPath(const std::string &cfg_path) {
-  SetGlobalDumpConfigPath(StringToChar(cfg_path));
-}
-std::string GlobalContext::GetGlobalDumpConfigPath() { return CharToString(GetGlobalDumpConfigPathChar()); }
+void Ascend310DeviceInfo::SetInputFormat(const std::string &format) { SetInputFormat(StringToChar(format)); }
+std::string Ascend310DeviceInfo::GetInputFormat() const { return CharToString(GetInputFormatChar()); }

-void ModelContext::SetInsertOpConfigPath(const std::shared_ptr<Context> &context, const std::string &cfg_path) {
-  SetInsertOpConfigPath(context, StringToChar(cfg_path));
+void Ascend310DeviceInfo::SetInputShape(const std::string &shape) { SetInputShape(StringToChar(shape)); }
+std::string Ascend310DeviceInfo::GetInputShape() const { return CharToString(GetInputShapeChar()); }
+
+std::string Ascend310DeviceInfo::GetDynamicBatchSize() const { return CharToString(GetDynamicBatchSizeChar()); }
+
+void Ascend310DeviceInfo::SetPrecisionMode(const std::string &precision_mode) {
+  SetPrecisionMode(StringToChar(precision_mode));
 }
-std::string ModelContext::GetInsertOpConfigPath(const std::shared_ptr<Context> &context) {
-  return CharToString(GetInsertOpConfigPathChar(context));
+std::string Ascend310DeviceInfo::GetPrecisionMode() const { return CharToString(GetPrecisionModeChar()); }
+
+void Ascend310DeviceInfo::SetOpSelectImplMode(const std::string &op_select_impl_mode) {
+  SetOpSelectImplMode(StringToChar(op_select_impl_mode));
+}
+std::string Ascend310DeviceInfo::GetOpSelectImplMode() const { return CharToString(GetOpSelectImplModeChar()); }
+
+void Ascend310DeviceInfo::SetFusionSwitchConfigPath(const std::string &cfg_path) {
+  SetFusionSwitchConfigPath(StringToChar(cfg_path));
+}
+std::string Ascend310DeviceInfo::GetFusionSwitchConfigPath() const {
+  return CharToString(GetFusionSwitchConfigPathChar());
 }

-void ModelContext::SetInputFormat(const std::shared_ptr<Context> &context, const std::string &format) {
-  SetInputFormat(context, StringToChar(format));
-}
-std::string ModelContext::GetInputFormat(const std::shared_ptr<Context> &context) {
-  return CharToString(GetInputFormatChar(context));
-}
-
-void ModelContext::SetInputShape(const std::shared_ptr<Context> &context, const std::string &shape) {
-  SetInputShape(context, StringToChar(shape));
-}
-std::string ModelContext::GetInputShape(const std::shared_ptr<Context> &context) {
-  return CharToString(GetInputShapeChar(context));
-}
-
-void ModelContext::SetPrecisionMode(const std::shared_ptr<Context> &context, const std::string &precision_mode) {
-  SetPrecisionMode(context, StringToChar(precision_mode));
-}
-std::string ModelContext::GetPrecisionMode(const std::shared_ptr<Context> &context) {
-  return CharToString(GetPrecisionModeChar(context));
-}
-
-void ModelContext::SetOpSelectImplMode(const std::shared_ptr<Context> &context,
-                                       const std::string &op_select_impl_mode) {
-  SetOpSelectImplMode(context, StringToChar(op_select_impl_mode));
-}
-std::string ModelContext::GetOpSelectImplMode(const std::shared_ptr<Context> &context) {
-  return CharToString(GetOpSelectImplModeChar(context));
-}
-
-void ModelContext::SetFusionSwitchConfigPath(const std::shared_ptr<Context> &context, const std::string &cfg_path) {
-  SetFusionSwitchConfigPath(context, StringToChar(cfg_path));
-}
-std::string ModelContext::GetFusionSwitchConfigPath(const std::shared_ptr<Context> &context) {
-  return CharToString(GetFusionSwitchConfigPathChar(context));
-}
-
-std::string ModelContext::GetDynamicBatchSize(const std::shared_ptr<Context> &context) {
-  return CharToString(GetDynamicBatchSizeChar(context));
-}
-
-void ModelContext::SetGpuTrtInferMode(const std::shared_ptr<Context> &context, const std::string &gpu_trt_infer_mode) {
-  SetGpuTrtInferMode(context, StringToChar(gpu_trt_infer_mode));
-}
-std::string ModelContext::GetGpuTrtInferMode(const std::shared_ptr<Context> &context) {
-  return CharToString(GetGpuTrtInferModeChar(context));
+void Ascend310DeviceInfo::SetBufferOptimizeMode(const std::string &buffer_optimize_mode) {
+  SetBufferOptimizeMode(StringToChar(buffer_optimize_mode));
 }
+std::string Ascend310DeviceInfo::GetBufferOptimizeMode() const { return CharToString(GetBufferOptimizeModeChar()); }
 }  // namespace mindspore
 #endif  // MINDSPORE_INCLUDE_API_CONTEXT_H
--- a/include/api/graph.h
+++ b/include/api/graph.h
@ -27,6 +27,7 @@ namespace mindspore {
 class MS_API Graph {
 public:
  class GraphData;
+  Graph();
  explicit Graph(const std::shared_ptr<GraphData> &graph_data);
  explicit Graph(std::shared_ptr<GraphData> &&graph_data);
  explicit Graph(std::nullptr_t);
@ -34,6 +35,7 @@ class MS_API Graph {

  enum ModelType ModelType() const;
  bool operator==(std::nullptr_t) const;
+  bool operator!=(std::nullptr_t) const;

 private:
  friend class GraphCell;
--- a/include/api/lite_context.h
+++ b/include/api/lite_context.h
@ -1,71 +0,0 @@
-/**
- * Copyright 2021 Huawei Technologies Co., Ltd
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-#ifndef MINDSPORE_INCLUDE_API_LITE_CONTEXT_H
-#define MINDSPORE_INCLUDE_API_LITE_CONTEXT_H
-
-#include <string>
-#include <memory>
-#include <map>
-#include <any>
-#include "include/api/types.h"
-#include "include/lite_types.h"
-
-namespace mindspore {
-namespace lite {
-class Allocator;
-}  // namespace lite
-
-struct MS_API Context {
- public:
-  static void Clear(const std::shared_ptr<Context> &context);
-
-  static void SetAsDefault(const std::shared_ptr<Context> &context);
-
-  static void SetVendorName(const std::shared_ptr<Context> &context, const std::string &name);
-  static std::string GetVendorName(const std::shared_ptr<Context> &context);
-
-  static void SetThreadNum(const std::shared_ptr<Context> &context, int num);
-  static int GetThreadNum(const std::shared_ptr<Context> &context);
-
-  static void SetAllocator(const std::shared_ptr<Context> &context, std::shared_ptr<lite::Allocator> alloc);
-  static std::shared_ptr<lite::Allocator> GetAllocator(const std::shared_ptr<Context> &context);
-
-  static void ConfigCPU(const std::shared_ptr<Context> &context, bool config);
-  static bool IfCPUEnabled(const std::shared_ptr<Context> &context);
-
-  static void ConfigCPUFp16(const std::shared_ptr<Context> &context, bool config);
-  static bool IfCPUFp16Enabled(const std::shared_ptr<Context> &context);
-
-  static void SetCPUBindMode(const std::shared_ptr<Context> &context, lite::CpuBindMode mode);
-  static lite::CpuBindMode GetCPUBindMode(const std::shared_ptr<Context> &context);
-
-  static void ConfigGPU(const std::shared_ptr<Context> &context, bool config);
-  static bool IfGPUEnabled(const std::shared_ptr<Context> &context);
-
-  static void ConfigGPUFp16(const std::shared_ptr<Context> &context, bool config);
-  static bool IfGPUFp16Enabled(const std::shared_ptr<Context> &context);
-
-  static void ConfigNPU(const std::shared_ptr<Context> &context, bool config);
-  static bool IfNPUEnabled(const std::shared_ptr<Context> &context);
-
-  static void SetNPUFrequency(const std::shared_ptr<Context> &context, int freq);
-  static int GetNPUFrequency(const std::shared_ptr<Context> &context);
-
- private:
-  std::map<std::string, std::any> context_;
-};
-}  // namespace mindspore
-#endif  // MINDSPORE_INCLUDE_API_LITE_CONTEXT_H
--- a/include/api/model.h
+++ b/include/api/model.h
@ -24,39 +24,57 @@
 #include "include/api/status.h"
 #include "include/api/types.h"
 #include "include/api/graph.h"
+#include "include/api/context.h"
 #include "include/api/cell.h"
 #include "include/api/dual_abi_helper.h"

 namespace mindspore {
 class ModelImpl;
-struct Context;

 class MS_API Model {
 public:
-  explicit Model(const std::vector<Output> &network, const std::shared_ptr<Context> &model_context = nullptr);
-  explicit Model(const GraphCell &graph, const std::shared_ptr<Context> &model_context = nullptr);
+  Model();
  ~Model();
  Model(const Model &) = delete;
  void operator=(const Model &) = delete;

-  Status Build();
+  Status Build(GraphCell graph, const std::shared_ptr<Context> &model_context = nullptr);
  Status Resize(const std::vector<MSTensor> &inputs, const std::vector<std::vector<int64_t>> &dims);

  Status Predict(const std::vector<MSTensor> &inputs, std::vector<MSTensor> *outputs);

  std::vector<MSTensor> GetInputs();
-  std::vector<MSTensor> GetOutputs();
+  inline MSTensor GetInputByTensorName(const std::string &tensor_name);

-  static inline bool CheckModelSupport(const std::string &device_type, ModelType model_type);
+  std::vector<MSTensor> GetOutputs();
+  inline std::vector<std::string> GetOutputTensorNames();
+  inline MSTensor GetOutputByTensorName(const std::string &tensor_name);
+  inline std::vector<MSTensor> GetOutputsByNodeName(const std::string &tensor_name);
+
+  static bool CheckModelSupport(enum DeviceType device_type, ModelType model_type);

 private:
  // api without std::string
-  static bool CheckModelSupport(const std::vector<char> &device_type, ModelType model_type);
+  MSTensor GetInputByTensorName(const std::vector<char> &tensor_name);
+  std::vector<std::vector<char>> GetOutputTensorNamesChar();
+  MSTensor GetOutputByTensorName(const std::vector<char> &tensor_name);
+  std::vector<MSTensor> GetOutputsByNodeName(const std::vector<char> &node_name);
+
  std::shared_ptr<ModelImpl> impl_;
 };

-bool Model::CheckModelSupport(const std::string &device_type, ModelType model_type) {
-  return CheckModelSupport(StringToChar(device_type), model_type);
+MSTensor Model::GetInputByTensorName(const std::string &tensor_name) {
+  return GetInputByTensorName(StringToChar(tensor_name));
+}
+
+std::vector<std::string> Model::GetOutputTensorNames() { return VectorCharToString(GetOutputTensorNamesChar()); }
+
+MSTensor Model::GetOutputByTensorName(const std::string &tensor_name) {
+  return GetOutputByTensorName(StringToChar(tensor_name));
+}
+
+std::vector<MSTensor> Model::GetOutputsByNodeName(const std::string &tensor_name) {
+  return GetOutputsByNodeName(StringToChar(tensor_name));
 }
 }  // namespace mindspore
 #endif  // MINDSPORE_INCLUDE_API_MODEL_H
--- a/include/api/serialization.h
+++ b/include/api/serialization.h
@ -29,19 +29,19 @@
 namespace mindspore {
 class MS_API Serialization {
 public:
-  static Graph LoadModel(const void *model_data, size_t data_size, ModelType model_type);
-  inline static Graph LoadModel(const std::string &file, ModelType model_type);
+  static Status Load(const void *model_data, size_t data_size, ModelType model_type, Graph *graph);
+  inline static Status Load(const std::string &file, ModelType model_type, Graph *graph);
  static Status LoadCheckPoint(const std::string &ckpt_file, std::map<std::string, Buffer> *parameters);
  static Status SetParameters(const std::map<std::string, Buffer> &parameters, Model *model);
  static Status ExportModel(const Model &model, ModelType model_type, Buffer *model_data);
  static Status ExportModel(const Model &model, ModelType model_type, const std::string &model_file);

 private:
-  static Graph LoadModel(const std::vector<char> &file, ModelType model_type);
+  static Status Load(const std::vector<char> &file, ModelType model_type, Graph *graph);
 };

-Graph Serialization::LoadModel(const std::string &file, ModelType model_type) {
-  return LoadModel(StringToChar(file), model_type);
+Status Serialization::Load(const std::string &file, ModelType model_type, Graph *graph) {
+  return Load(StringToChar(file), model_type, graph);
 }
 }  // namespace mindspore
 #endif  // MINDSPORE_INCLUDE_API_SERIALIZATION_H
--- a/include/api/types.h
+++ b/include/api/types.h
@ -43,15 +43,19 @@ class MS_API MSTensor {
 public:
  class Impl;

-  static inline MSTensor CreateTensor(const std::string &name, DataType type, const std::vector<int64_t> &shape,
-                                      const void *data, size_t data_len) noexcept;
-  static inline MSTensor CreateRefTensor(const std::string &name, DataType type, const std::vector<int64_t> &shape,
-                                         const void *data, size_t data_len) noexcept;
+  static inline MSTensor *CreateTensor(const std::string &name, DataType type, const std::vector<int64_t> &shape,
+                                       const void *data, size_t data_len) noexcept;
+  static inline MSTensor *CreateRefTensor(const std::string &name, DataType type, const std::vector<int64_t> &shape,
+                                          const void *data, size_t data_len) noexcept;
+  static inline MSTensor *StringsToTensor(const std::string &name, const std::vector<std::string> &str);
+  static inline std::vector<std::string> TensorToStrings(const MSTensor &tensor);
+  static void DestroyTensorPtr(MSTensor *tensor) noexcept;

  MSTensor();
  explicit MSTensor(const std::shared_ptr<Impl> &impl);
  inline MSTensor(const std::string &name, DataType type, const std::vector<int64_t> &shape, const void *data,
                  size_t data_len);
+  explicit MSTensor(std::nullptr_t);
  ~MSTensor();

  inline std::string Name() const;
@ -65,21 +69,24 @@ class MS_API MSTensor {

  bool IsDevice() const;

-  MSTensor Clone() const;
+  MSTensor *Clone() const;
  bool operator==(std::nullptr_t) const;
+  bool operator!=(std::nullptr_t) const;

 private:
  // api without std::string
-  static MSTensor CreateTensor(const std::vector<char> &name, enum DataType type, const std::vector<int64_t> &shape,
-                               const void *data, size_t data_len) noexcept;
-  static MSTensor CreateRefTensor(const std::vector<char> &name, enum DataType type, const std::vector<int64_t> &shape,
-                                  const void *data, size_t data_len) noexcept;
+  static MSTensor *CreateTensor(const std::vector<char> &name, enum DataType type, const std::vector<int64_t> &shape,
+                                const void *data, size_t data_len) noexcept;
+  static MSTensor *CreateRefTensor(const std::vector<char> &name, enum DataType type, const std::vector<int64_t> &shape,
+                                   const void *data, size_t data_len) noexcept;
+  static MSTensor *CharStringsToTensor(const std::vector<char> &name, const std::vector<std::vector<char>> &str);
+  static std::vector<std::vector<char>> TensorToStringChars(const MSTensor &tensor);
+
  MSTensor(const std::vector<char> &name, enum DataType type, const std::vector<int64_t> &shape, const void *data,
           size_t data_len);
  std::vector<char> CharName() const;

  friend class ModelImpl;
-  explicit MSTensor(std::nullptr_t);
  std::shared_ptr<Impl> impl_;
 };

@ -103,16 +110,24 @@ class MS_API Buffer {
  std::shared_ptr<Impl> impl_;
 };

-MSTensor MSTensor::CreateTensor(const std::string &name, enum DataType type, const std::vector<int64_t> &shape,
-                                const void *data, size_t data_len) noexcept {
+MSTensor *MSTensor::CreateTensor(const std::string &name, enum DataType type, const std::vector<int64_t> &shape,
+                                 const void *data, size_t data_len) noexcept {
  return CreateTensor(StringToChar(name), type, shape, data, data_len);
 }

-MSTensor MSTensor::CreateRefTensor(const std::string &name, enum DataType type, const std::vector<int64_t> &shape,
-                                   const void *data, size_t data_len) noexcept {
+MSTensor *MSTensor::CreateRefTensor(const std::string &name, enum DataType type, const std::vector<int64_t> &shape,
+                                    const void *data, size_t data_len) noexcept {
  return CreateRefTensor(StringToChar(name), type, shape, data, data_len);
 }

+MSTensor *MSTensor::StringsToTensor(const std::string &name, const std::vector<std::string> &str) {
+  return CharStringsToTensor(StringToChar(name), VectorStringToChar(str));
+}
+
+std::vector<std::string> MSTensor::TensorToStrings(const MSTensor &tensor) {
+  return VectorCharToString(TensorToStringChars(tensor));
+}
+
 MSTensor::MSTensor(const std::string &name, enum DataType type, const std::vector<int64_t> &shape, const void *data,
                   size_t data_len)
    : MSTensor(StringToChar(name), type, shape, data, data_len) {}
--- a/include/infer_log.h
+++ b/include/infer_log.h
@ -1,134 +0,0 @@
-/**
- * Copyright 2019 Huawei Technologies Co., Ltd
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef MINDSPORE_INFERENCE_LOG_H_
-#define MINDSPORE_INFERENCE_LOG_H_
-
-#include <stdarg.h>
-#include <stdint.h>
-#include <string>
-#include <sstream>
-#include <memory>
-#include <iostream>
-#include <chrono>
-#include <vector>
-
-#ifndef ENABLE_ACL
-#include "mindspore/core/utils/log_adapter.h"
-#else  // ENABLE_ACL
-#include "acl/acl.h"
-#endif
-
-namespace mindspore::inference {
-
-class LogStream {
- public:
-  LogStream() { sstream_ = std::make_shared<std::stringstream>(); }
-  ~LogStream() = default;
-
-  template <typename T>
-  LogStream &operator<<(const T &val) noexcept {
-    (*sstream_) << val;
-    return *this;
-  }
-
-  template <typename T>
-  LogStream &operator<<(const std::vector<T> &val) noexcept {
-    (*sstream_) << "[";
-    for (size_t i = 0; i < val.size(); i++) {
-      (*this) << val[i];
-      if (i + 1 < val.size()) {
-        (*sstream_) << ", ";
-      }
-    }
-    (*sstream_) << "]";
-    return *this;
-  }
-
-  LogStream &operator<<(std::ostream &func(std::ostream &os)) noexcept {
-    (*sstream_) << func;
-    return *this;
-  }
-
-  friend class LogWriter;
-  friend class Status;
-
- private:
-  std::shared_ptr<std::stringstream> sstream_;
-};
-
-#ifndef ENABLE_ACL
-#define MSI_LOG(level) MS_LOG(level)
-
-#define MSI_LOG_DEBUG MSI_LOG(DEBUG)
-#define MSI_LOG_INFO MSI_LOG(INFO)
-#define MSI_LOG_WARNING MSI_LOG(WARNING)
-#define MSI_LOG_ERROR MSI_LOG(ERROR)
-
-#define MSI_ASSERT(item) MS_ASSERT(item)
-
-#else  // ENABLE_ACL
-
-class LogWriter {
- public:
-  LogWriter(const char *file, int line, const char *func, aclLogLevel log_level)
-      : file_(file), line_(line), func_(func), log_level_(log_level) {}
-  ~LogWriter() = default;
-
-  void operator<(const LogStream &stream) const noexcept __attribute__((visibility("default"))) {
-    std::ostringstream msg;
-    msg << stream.sstream_->rdbuf();
-    OutputLog(msg);
-  }
-
- private:
-  void OutputLog(const std::ostringstream &msg) const { aclAppLog(log_level_, func_, file_, line_, msg.str().c_str()); }
-
-  const char *file_;
-  int line_;
-  const char *func_;
-  aclLogLevel log_level_;
-};
-
-#define MSILOG_IF(level) inference::LogWriter(__FILE__, __LINE__, __FUNCTION__, ACL_##level) < inference::LogStream()
-
-#define MSI_LOG(level) MSI_LOG_##level
-
-#define MSI_LOG_DEBUG MSILOG_IF(DEBUG)
-#define MSI_LOG_INFO MSILOG_IF(INFO)
-#define MSI_LOG_WARNING MSILOG_IF(WARNING)
-#define MSI_LOG_ERROR MSILOG_IF(ERROR)
-
-#define MSI_ASSERT(item)
-
-#endif  // ENABLE_ACL
-
-#define MSI_TIME_STAMP_START(name) auto time_start_##name = std::chrono::steady_clock::now();
-#define MSI_TIME_STAMP_END(name)                                                                             \
-  {                                                                                                          \
-    auto time_end_##name = std::chrono::steady_clock::now();                                                 \
-    auto time_cost = std::chrono::duration<double, std::milli>(time_end_##name - time_start_##name).count(); \
-    MSI_LOG_INFO << #name " Time Cost # " << time_cost << " ms ---------------------";                       \
-  }
-
-#define INFER_STATUS(code) inference::Status(code) < inference::LogStream()
-#define ERROR_INFER_STATUS(status, type, msg) \
-  MSI_LOG_ERROR << msg;                       \
-  status = inference::Status(type, msg)
-
-}  // namespace mindspore::inference
-
-#endif  // MINDSPORE_INFERENCE_LOG_H_
--- a/include/infer_tensor.h
+++ b/include/infer_tensor.h
@ -1,217 +0,0 @@
-/**
- * Copyright 2020 Huawei Technologies Co., Ltd
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef MINDSPORE_INCLUDE_INFER_TENSOR_H_
-#define MINDSPORE_INCLUDE_INFER_TENSOR_H_
-
-#include <utility>
-#include <vector>
-#include <memory>
-#include <numeric>
-#include <map>
-#include <functional>
-
-#include "securec/include/securec.h"
-#include "include/infer_log.h"
-
-namespace mindspore {
-#define MS_API __attribute__((visibility("default")))
-namespace inference {
-enum DataType {
-  kMSI_Unknown = 0,
-  kMSI_Bool = 1,
-  kMSI_Int8 = 2,
-  kMSI_Int16 = 3,
-  kMSI_Int32 = 4,
-  kMSI_Int64 = 5,
-  kMSI_Uint8 = 6,
-  kMSI_Uint16 = 7,
-  kMSI_Uint32 = 8,
-  kMSI_Uint64 = 9,
-  kMSI_Float16 = 10,
-  kMSI_Float32 = 11,
-  kMSI_Float64 = 12,
-};
-
-class InferTensorBase {
- public:
-  InferTensorBase() = default;
-  virtual ~InferTensorBase() = default;
-
-  virtual DataType data_type() const = 0;
-  virtual void set_data_type(DataType type) = 0;
-  virtual std::vector<int64_t> shape() const = 0;
-  virtual void set_shape(const std::vector<int64_t> &shape) = 0;
-  virtual const void *data() const = 0;
-  virtual size_t data_size() const = 0;
-  virtual bool resize_data(size_t data_len) = 0;
-  virtual void *mutable_data() = 0;
-
-  bool set_data(const void *data, size_t data_len) {
-    resize_data(data_len);
-    if (mutable_data() == nullptr) {
-      MSI_LOG_ERROR << "set data failed, data len " << data_len;
-      return false;
-    }
-    if (data_size() != data_len) {
-      MSI_LOG_ERROR << "set data failed, tensor current data size " << data_size() << " not match data len "
-                    << data_len;
-      return false;
-    }
-    if (data_len == 0) {
-      return true;
-    }
-    auto ret = memcpy_s(mutable_data(), data_size(), data, data_len);
-    if (ret != 0) {
-      MSI_LOG_ERROR << "Set data memcpy_s failed";
-      return false;
-    }
-    return true;
-  }
-
-  int64_t ElementNum() const {
-    std::vector<int64_t> shapex = shape();
-    return std::accumulate(shapex.begin(), shapex.end(), 1LL, std::multiplies<int64_t>());
-  }
-
-  int GetTypeSize(DataType type) const {
-    const std::map<DataType, size_t> type_size_map{
-      {kMSI_Bool, sizeof(bool)},       {kMSI_Float64, sizeof(double)},   {kMSI_Int8, sizeof(int8_t)},
-      {kMSI_Uint8, sizeof(uint8_t)},   {kMSI_Int16, sizeof(int16_t)},    {kMSI_Uint16, sizeof(uint16_t)},
-      {kMSI_Int32, sizeof(int32_t)},   {kMSI_Uint32, sizeof(uint32_t)},  {kMSI_Int64, sizeof(int64_t)},
-      {kMSI_Uint64, sizeof(uint64_t)}, {kMSI_Float16, sizeof(uint16_t)}, {kMSI_Float32, sizeof(float)},
-    };
-    auto it = type_size_map.find(type);
-    if (it != type_size_map.end()) {
-      return it->second;
-    }
-    return 0;
-  }
-};
-
-class InferTensor : public InferTensorBase {
- public:
-  DataType type_;
-  std::vector<int64_t> shape_;
-  std::vector<uint8_t> data_;
-
- public:
-  InferTensor() = default;
-  ~InferTensor() = default;
-  InferTensor(DataType type, std::vector<int64_t> shape, const void *data, size_t data_len) {
-    set_data_type(type);
-    set_shape(shape);
-    set_data(data, data_len);
-  }
-
-  void set_data_type(DataType type) override { type_ = type; }
-  DataType data_type() const override { return type_; }
-
-  void set_shape(const std::vector<int64_t> &shape) override { shape_ = shape; }
-  std::vector<int64_t> shape() const override { return shape_; }
-
-  const void *data() const override { return data_.data(); }
-  size_t data_size() const override { return data_.size(); }
-
-  bool resize_data(size_t data_len) override {
-    data_.resize(data_len);
-    return true;
-  }
-  void *mutable_data() override { return data_.data(); }
-};
-
-class InferImagesBase {
- public:
-  InferImagesBase() = default;
-  virtual ~InferImagesBase() = default;
-  virtual size_t batch_size() const = 0;
-  virtual bool get(size_t index, const void *&pic_buffer, uint32_t &pic_size) const = 0;
-  virtual size_t input_index() const = 0;  // the index of images as input in model
-};
-
-class RequestBase {
- public:
-  RequestBase() = default;
-  virtual ~RequestBase() = default;
-  virtual size_t size() const = 0;
-  virtual const InferTensorBase *operator[](size_t index) const = 0;
-};
-
-class ImagesRequestBase {
- public:
-  ImagesRequestBase() = default;
-  virtual ~ImagesRequestBase() = default;
-  virtual size_t size() const = 0;
-  virtual const InferImagesBase *operator[](size_t index) const = 0;
-};
-
-class ReplyBase {
- public:
-  ReplyBase() = default;
-  virtual ~ReplyBase() = default;
-  virtual size_t size() const = 0;
-  virtual InferTensorBase *operator[](size_t index) = 0;
-  virtual const InferTensorBase *operator[](size_t index) const = 0;
-  virtual InferTensorBase *add() = 0;
-  virtual void clear() = 0;
-};
-
-class VectorInferTensorWrapReply : public ReplyBase {
- public:
-  explicit VectorInferTensorWrapReply(std::vector<InferTensor> &tensor_list) : tensor_list_(tensor_list) {}
-  ~VectorInferTensorWrapReply() = default;
-
-  size_t size() const { return tensor_list_.size(); }
-  InferTensorBase *operator[](size_t index) {
-    if (index >= tensor_list_.size()) {
-      MSI_LOG_ERROR << "visit invalid index " << index << " total size " << tensor_list_.size();
-      return nullptr;
-    }
-    return &(tensor_list_[index]);
-  }
-  const InferTensorBase *operator[](size_t index) const {
-    if (index >= tensor_list_.size()) {
-      MSI_LOG_ERROR << "visit invalid index " << index << " total size " << tensor_list_.size();
-      return nullptr;
-    }
-    return &(tensor_list_[index]);
-  }
-  InferTensorBase *add() {
-    tensor_list_.push_back(InferTensor());
-    return &(tensor_list_.back());
-  }
-  void clear() { tensor_list_.clear(); }
-  std::vector<InferTensor> &tensor_list_;
-};
-
-class VectorInferTensorWrapRequest : public RequestBase {
- public:
-  explicit VectorInferTensorWrapRequest(const std::vector<InferTensor> &tensor_list) : tensor_list_(tensor_list) {}
-  ~VectorInferTensorWrapRequest() = default;
-
-  size_t size() const { return tensor_list_.size(); }
-  const InferTensorBase *operator[](size_t index) const {
-    if (index >= tensor_list_.size()) {
-      MSI_LOG_ERROR << "visit invalid index " << index << " total size " << tensor_list_.size();
-      return nullptr;
-    }
-    return &(tensor_list_[index]);
-  }
-  const std::vector<InferTensor> &tensor_list_;
-};
-}  // namespace inference
-}  // namespace mindspore
-#endif  // MINDSPORE_INCLUDE_INFER_TENSOR_H_
--- a/include/inference.h
+++ b/include/inference.h
@ -1,86 +0,0 @@
-/**
- * Copyright 2020 Huawei Technologies Co., Ltd
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#ifndef MINDSPORE_INCLUDE_MS_SESSION_H
-#define MINDSPORE_INCLUDE_MS_SESSION_H
-
-#include <memory>
-#include <vector>
-#include <string>
-#include "include/infer_tensor.h"
-#include "include/infer_log.h"
-
-namespace mindspore {
-namespace inference {
-enum StatusCode { SUCCESS = 0, FAILED, INVALID_INPUTS };
-
-class Status {
- public:
-  Status() : status_code_(FAILED) {}
-  Status(enum StatusCode status_code, const std::string &status_msg = "")
-      : status_code_(status_code), status_msg_(status_msg) {}
-  ~Status() = default;
-
-  bool IsSuccess() const { return status_code_ == SUCCESS; }
-  enum StatusCode StatusCode() const { return status_code_; }
-  std::string StatusMessage() const { return status_msg_; }
-  bool operator==(const Status &other) const { return status_code_ == other.status_code_; }
-  bool operator==(enum StatusCode other_code) const { return status_code_ == other_code; }
-  bool operator!=(const Status &other) const { return status_code_ != other.status_code_; }
-  bool operator!=(enum StatusCode other_code) const { return status_code_ != other_code; }
-  operator bool() const = delete;
-  Status &operator<(const LogStream &stream) noexcept __attribute__((visibility("default"))) {
-    status_msg_ = stream.sstream_->str();
-    return *this;
-  }
-
- private:
-  enum StatusCode status_code_;
-  std::string status_msg_;
-};
-
-class MS_API InferSession {
- public:
-  InferSession() = default;
-  virtual ~InferSession() = default;
-  virtual Status InitEnv(const std::string &device_type, uint32_t device_id) = 0;
-  virtual Status FinalizeEnv() = 0;
-  virtual Status LoadModelFromFile(const std::string &file_name, uint32_t &model_id) = 0;
-  virtual Status UnloadModel(uint32_t model_id) = 0;
-  // override this method to avoid request/reply data copy
-  virtual Status ExecuteModel(uint32_t model_id, const RequestBase &request, ReplyBase &reply) = 0;
-
-  virtual Status ExecuteModel(uint32_t model_id, const std::vector<InferTensor> &inputs,
-                              std::vector<InferTensor> &outputs) {
-    VectorInferTensorWrapRequest request(inputs);
-    VectorInferTensorWrapReply reply(outputs);
-    return ExecuteModel(model_id, request, reply);
-  }
-  // default not support input data preprocess(decode, resize, crop, crop&paste, etc.)
-  virtual Status ExecuteModel(uint32_t /*model_id*/,
-                              const ImagesRequestBase & /*images_inputs*/,  // images for preprocess
-                              const RequestBase & /*request*/, ReplyBase & /*reply*/) {
-    return FAILED;
-  }
-  virtual Status GetModelInputsInfo(uint32_t graph_id, std::vector<inference::InferTensor> *tensor_list) const {
-    Status status(SUCCESS);
-    return status;
-  }
-  static std::shared_ptr<InferSession> CreateSession(const std::string &device, uint32_t device_id);
-};
-}  // namespace inference
-}  // namespace mindspore
-#endif  // MINDSPORE_INCLUDE_MS_SESSION_H
--- a/mindspore/init.py
+++ b/mindspore/init.py
@ -15,7 +15,7 @@
 """.. MindSpore package."""

 from ._check_version import check_version_and_env_config
-from . import common, train
+from . import common, train, log
 from .common import *
 from .ops import _op_impl
 from .train import *
--- a/mindspore/_check_version.py
+++ b/mindspore/_check_version.py
@ -46,6 +46,7 @@ class GPUEnvChecker(EnvChecker):

    def __init__(self):
        self.version = ["10.1"]
+        self.lib_key_to_lib_name = {'libcu': 'libcuda.so'}
        # env
        self.path = os.getenv("PATH")
        self.ld_lib_path = os.getenv("LD_LIBRARY_PATH")
@ -131,25 +132,32 @@ class GPUEnvChecker(EnvChecker):
        """Get gpu lib path by ldd command."""
        path_list = []
        current_path = os.path.split(os.path.realpath(__file__))[0]
-        ldd_result = subprocess.run(["ldd " + current_path + "/_c_expression*.so* | grep " + lib_name],
-                                    timeout=3, text=True, capture_output=True, check=False, shell=True)
-        if ldd_result.returncode:
-            logger.warning(f"{lib_name} so(need by mndspore-gpu) is not found, please confirm that "
-                           f"_c_experssion.so depend on {lib_name}, "
-                           f"and _c_expression.so in directory:{current_path}")
+        try:
+            ldd_result = subprocess.run(["ldd " + current_path + "/_c_expression*.so* | grep " + lib_name],
+                                        timeout=10, text=True, capture_output=True, check=False, shell=True)
+            if ldd_result.returncode:
+                logger.error(f"{self.lib_key_to_lib_name[lib_name]} (need by mindspore-gpu) is not found, please "
+                             f"confirm that _c_expression.so is in directory:{current_path} and the correct cuda "
+                             "version has been installed, you can refer to the installation "
+                             "guidelines: https://www.mindspore.cn/install")
+                return path_list
+            result = ldd_result.stdout
+            for i in result.split('\n'):
+                path = i.partition("=>")[2]
+                if path.lower().find("not found") > 0:
+                    logger.warning(f"Cuda {self.version} version(need by mindspore-gpu) is not found, please confirm "
+                                   "that the path of cuda is set to the env LD_LIBRARY_PATH, please refer to the "
+                                   "installation guidelines: https://www.mindspore.cn/install")
+                    continue
+                path = path.partition(lib_name)[0]
+                if path:
+                    path_list.append(os.path.abspath(path.strip() + "../"))
+            return np.unique(path_list)
+        except subprocess.TimeoutExpired:
+            logger.warning("Failed to check cuda version due to the ldd command timeout, please confirm that "
+                           "the correct cuda version has been installed, you can refer to the "
+                           "installation guidelines: https://www.mindspore.cn/install")
            return path_list
-        result = ldd_result.stdout
-        for i in result.split('\n'):
-            path = i.partition("=>")[2]
-            if path.lower().find("not found") > 0:
-                logger.warning(f"Cuda {self.version} version(need by mindspore-gpu) is not found, please confirm "
-                               "that the path of cuda is set to the env LD_LIBRARY_PATH, please refer to the "
-                               "installation guidelines: https://www.mindspore.cn/install")
-                continue
-            path = path.partition(lib_name)[0]
-            if path:
-                path_list.append(os.path.abspath(path.strip() + "../"))
-        return np.unique(path_list)

    def _read_version(self, file_path):
        """Get gpu version info in version.txt."""
@ -166,7 +174,7 @@ class AscendEnvChecker(EnvChecker):
    """ascend environment check"""

    def __init__(self):
-        self.version = ["1.77.22.0.220"]
+        self.version = ["1.77.22.6.220"]
        atlas_nnae_version = "/usr/local/Ascend/nnae/latest/fwkacllib/version.info"
        atlas_toolkit_version = "/usr/local/Ascend/ascend-toolkit/latest/fwkacllib/version.info"
        hisi_fwk_version = "/usr/local/Ascend/fwkacllib/version.info"
--- a/mindspore/_checkparam.py
+++ b/mindspore/_checkparam.py
@ -100,7 +100,7 @@ def _check_3d_int_or_tuple(arg_name, arg_value, prim_name, allow_five=False, ret

    def _raise_message(third_one_flag=False, three_input_flag=False):
        if third_one_flag:
-            raise ValueError(f"For '{prim_name}' the depth of attr '{arg_name}' should be 1, but got {arg_value[-3]}")
+            raise ValueError(f"For '{prim_name}' the depth of attr '{arg_name}' should be 1, but got {ret_value[-3]}")
        if three_input_flag:
            raise ValueError(f"For '{prim_name}' attr '{arg_name}' should be an positive int number or a tuple of "
                             f"three positive int numbers, but got {arg_value}")
@ -110,8 +110,6 @@ def _check_3d_int_or_tuple(arg_name, arg_value, prim_name, allow_five=False, ret
    def _get_return_value():
        if isinstance(arg_value, int):
            ret = (1, 1, arg_value, arg_value, arg_value) if ret_five else (arg_value, arg_value, arg_value)
-            if third_one:
-                ret = (1, 1, 1, arg_value, arg_value) if ret_five else (1, arg_value, arg_value)
        elif len(arg_value) == 3:
            ret = (1, 1, arg_value[0], arg_value[1], arg_value[2]) if ret_five else arg_value
        elif len(arg_value) == 5:
--- a/mindspore/_extends/graph_kernel/expander.py
+++ b/mindspore/_extends/graph_kernel/expander.py
@ -23,11 +23,40 @@ from mindspore._extends.graph_kernel.model.model import GraphKernelUnsupportedEx

 def create_expander(expand_info):
    """Create an expander according to op name"""
+    expander_list = {
+        "AssignAdd": expanders.AssignAdd,
+        "BiasAdd": expanders.BiasAdd,
+        "BiasAddGrad": expanders.BiasAddGrad,
+        "ClipByNormNoDivSum": expanders.ClipByNormNoDivSum,
+        "DropoutGrad": expanders.DropoutGrad,
+        "FusedAdam": expanders.FusedAdam,
+        "FusedAdamWeightDecay": expanders.FusedAdamWeightDecay,
+        "GeLU": expanders.GeLU,
+        "GeLUGrad": expanders.GeLUGrad,
+        "GkDropout": expanders.GkDropout,
+        "LayerNorm": expanders.LayerNorm,
+        "LayerNormGrad": expanders.LayerNormGrad,
+        "LogSoftmax": expanders.LogSoftmax,
+        "LogSoftmaxGrad": expanders.LogSoftmaxGrad,
+        "MaximumGrad": expanders.MaximumGrad,
+        "MinimumGrad": expanders.MinimumGrad,
+        "ReduceMean": expanders.ReduceMean,
+        "Softmax": expanders.Softmax,
+        "Sigmoid": expanders.Sigmoid,
+        "SigmoidGrad": expanders.SigmoidGrad,
+        "SigmoidCrossEntropyWithLogits": expanders.SigmoidCrossEntropyWithLogits,
+        "SigmoidCrossEntropyWithLogitsGrad": expanders.SigmoidCrossEntropyWithLogitsGrad,
+        "SoftmaxCrossEntropyWithLogits": expanders.SoftmaxCrossEntropyWithLogits,
+        "SqrtGrad": expanders.SqrtGrad,
+        "Square": expanders.Square,
+        "TanhGrad": expanders.TanhGrad,
+        "Tile": expanders.Tile,
+        "LambApplyOptimizerAssign": expanders.LambApplyOptimizerAssign,
+    }
    op_name = str(expand_info['name'])
-    if not hasattr(expanders, op_name):
+    if op_name not in expander_list:
        raise GraphKernelUnsupportedException("Generator do not support op: {}".format(op_name))
-    expander = getattr(expanders, op_name)
-    return expander(expand_info)
+    return expander_list[op_name](expand_info)


 def extract_expand_info(kernel_info):
--- a/mindspore/_extends/graph_kernel/expanders/init.py
+++ b/mindspore/_extends/graph_kernel/expanders/init.py
@ -14,6 +14,7 @@
 # ============================================================================
 """expanders init"""

+from .assign_add import AssignAdd
 from .bias_add import BiasAdd
 from .bias_add_grad import BiasAddGrad
 from .clip_by_norm_no_div_sum import ClipByNormNoDivSum
@ -31,7 +32,13 @@ from .maximum_grad import MaximumGrad
 from .minimum_grad import MinimumGrad
 from .reduce_mean import ReduceMean
 from .softmax import Softmax
+from .sigmoid import Sigmoid
+from .sigmoid_grad import SigmoidGrad
+from .sigmoid_cross_entropy_with_logits import SigmoidCrossEntropyWithLogits
+from .sigmoid_cross_entropy_with_logits_grad import SigmoidCrossEntropyWithLogitsGrad
+from .softmax_cross_entropy_with_logits import SoftmaxCrossEntropyWithLogits
 from .sqrt_grad import SqrtGrad
 from .square import Square
 from .tanh_grad import TanhGrad
 from .tile import Tile
+from .lamb_apply_optimizer_assign import LambApplyOptimizerAssign
--- a/mindspore/_extends/graph_kernel/expanders/_utils.py
+++ b/mindspore/_extends/graph_kernel/expanders/_utils.py
@ -66,19 +66,18 @@ class Expander:

 class ExpanderInfoValidator:
    """ExpanderInfoValidator is the utility class which defines the validator decorator for expanders"""
-    # pylint: disable=W0211
    @staticmethod
-    def _add_check_function(cls, func):
+    def _add_check_function(kls, func):
        """
        Rewrite the function `_check` in class Expander
        to append the new `func` after the original checks.
        """
-        old_check = getattr(cls, "_check")
+        old_check = getattr(kls, "_check")

        def new_check(obj):
            old_check(obj)
            func(obj)
-        setattr(cls, "_check", new_check)
+        setattr(kls, "_check", new_check)

    @staticmethod
    def add_format(*input_format):
@ -112,7 +111,7 @@ class ExpanderInfoValidator:
        return wrapper

    @staticmethod
-    def check_all_formats_same(cls):
+    def check_all_formats_same(kls):
        """Check that all formats are the same"""
        def _check_format(obj):
            inp_formats = [inp['format'] for inp in obj.inputs]
@ -122,10 +121,10 @@ class ExpanderInfoValidator:
                ','.join(inp_formats), obj.name))

        def wrapper(*args, **kargs):
-            if not issubclass(cls, Expander):
-                raise Exception("{} should be subclass of Expander.".format(cls.__name__))
-            ExpanderInfoValidator._add_check_function(cls, _check_format)
-            return cls(*args, **kargs)
+            if not issubclass(kls, Expander):
+                raise Exception("{} should be subclass of Expander.".format(kls.__name__))
+            ExpanderInfoValidator._add_check_function(kls, _check_format)
+            return kls(*args, **kargs)

        return wrapper

--- a/mindspore/_extends/graph_kernel/expanders/assign_add.py
+++ b/mindspore/_extends/graph_kernel/expanders/assign_add.py
@ -0,0 +1,30 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===========================================================================
+"""generate json desc for assign_add"""
+from ._utils import Expander, ExpanderInfoValidator as VLD
+
+
+@VLD.check_all_formats_same
+class AssignAdd(Expander):
+    """AssignAdd expander"""
+
+    def _expand(self, graph_builder):
+        param, x = self.inputs
+        next_para = graph_builder.emit('Add', [param, x])
+
+        param_result = graph_builder.emit(
+            'InplaceAssign', [param, next_para, next_para], attrs={'fake_output': True})
+
+        return param_result
--- a/mindspore/_extends/graph_kernel/expanders/gelu.py
+++ b/mindspore/_extends/graph_kernel/expanders/gelu.py
@ -23,7 +23,7 @@ class GeLU(Expander):

    def _expand(self, graph_builder):
        # cal formula are:
-        # gelu(x) is 0.5 * x * (1.0 + tanh(y))
+        # gelu of x is 0.5 * x * (1.0 + tanh(y))
        # y is sqrt(2.0 / pi) * (x + 0.044715 * x * x * x)

        input_x = self.inputs[0]
--- a/mindspore/_extends/graph_kernel/expanders/gelu_grad.py
+++ b/mindspore/_extends/graph_kernel/expanders/gelu_grad.py
@ -25,7 +25,7 @@ class GeLUGrad(Expander):

    def _expand(self, graph_builder):
        # cal formula are:
-        # gelu_grad(dy, x) is dy * y'
+        # gelu_grad of dy and x is dy * y'
        # y' is 0.5 * (1.0 + tanh(tanh_para)) + 0.5 * x * (1.0 - tanh(tanh_para) * tanh(para)) * mul_right
        # tanh_para is sqrt(2.0 / pi) * (x + 0.044715 * x * x * x)
        # mul_right is sqrt(2.0 / pi) * (1 + 3 * 0.044715 * x * x)
--- a/mindspore/_extends/graph_kernel/expanders/lamb_apply_optimizer_assign.py
+++ b/mindspore/_extends/graph_kernel/expanders/lamb_apply_optimizer_assign.py
@ -0,0 +1,76 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===========================================================================
+"""generate json desc for LambApplyOptimizerAssign"""
+from ._utils import Expander, ExpanderInfoValidator as VLD
+
+@VLD.check_all_formats_same
+class LambApplyOptimizerAssign(Expander):
+    """LambApplyOptimizerAssign expander"""
+
+    def _expand(self, graph_builder):
+
+        [grad, inputv, inputm, input_param, beta_1, one_minus_beta_1, beta_2, one_minus_beta_2, epsilon, steps,
+         do_use_weight, weight_decay_rate] = self.inputs
+
+        # next_v
+        square_grad = graph_builder.emit('Mul', [grad, grad])
+        mul_3_result = graph_builder.emit('Mul', [square_grad, one_minus_beta_2])
+        mul_2_result = graph_builder.emit('Mul', [inputv, beta_2])
+        next_v = graph_builder.emit('Add', [mul_2_result, mul_3_result])
+
+        # next_m
+        mul_0_result = graph_builder.emit('Mul', [inputm, beta_1])
+        mul_1_result = graph_builder.emit('Mul', [grad, one_minus_beta_1])
+        next_m = graph_builder.emit('Add', [mul_0_result, mul_1_result])
+
+        shape = next_m.shape
+        const_one = graph_builder.value(beta_2.dtype, 1)
+
+        beta_1_tensor = graph_builder.emit('BroadcastTo', [beta_1], attrs={'shape': shape})
+        beta_2_tensor = graph_builder.emit('BroadcastTo', [beta_2], attrs={'shape': shape})
+
+
+        # pow
+        beta_1_log = graph_builder.emit('Log', [beta_1_tensor])
+        mul_res_1 = graph_builder.emit('Mul', [beta_1_log, steps])
+        beta_1_steps = graph_builder.emit('Exp', [mul_res_1])
+
+        neg_beta_1_step = graph_builder.emit('Neg', [beta_1_steps])
+        beta1_correction = graph_builder.emit('Add', [neg_beta_1_step, const_one])
+
+        next_m_unbiased = graph_builder.emit('RealDiv', [next_m, beta1_correction])
+
+        # pow
+        beta_2_log = graph_builder.emit('Log', [beta_2_tensor])
+        mul_res_2 = graph_builder.emit('Mul', [beta_2_log, steps])
+        beta_2_steps = graph_builder.emit('Exp', [mul_res_2])
+
+        neg_beta_2_step = graph_builder.emit('Neg', [beta_2_steps])
+        beta2_correction = graph_builder.emit('Add', [neg_beta_2_step, const_one])
+
+        next_v_unbiased = graph_builder.emit('RealDiv', [next_v, beta2_correction])
+        # update
+        sqrt_next_v = graph_builder.emit('Sqrt', [next_v_unbiased])
+
+        add_2_result = graph_builder.emit('Add', [sqrt_next_v, epsilon])
+        update = graph_builder.emit('RealDiv', [next_m_unbiased, add_2_result])
+        # update do_use_weight_decay
+        do_use_weight_mul = graph_builder.emit('Mul', [input_param, weight_decay_rate])
+        do_use_weight_decay = graph_builder.emit('Mul', [do_use_weight_mul, do_use_weight])
+        update = graph_builder.emit('Add', [do_use_weight_decay, update])
+
+        res = [update, next_v, next_m]
+
+        return res
--- a/mindspore/_extends/graph_kernel/expanders/layernorm_grad.py
+++ b/mindspore/_extends/graph_kernel/expanders/layernorm_grad.py
@ -73,7 +73,7 @@ class LayerNormGrad(Expander):
        sum_3_mul = graph_builder.emit('Mul', [const_neg_two, x_sub_mean])
        sum_3 = graph_builder.emit('ReduceSum', [sum_3_mul], attrs={'reduce_axis': norm_axis, 'keep_dims': True})

-        # cal dx = dx1 + dx2 + dx3
+        # cal dx, which is dx1 + dx2 + dx3
        dx_1 = graph_builder.emit('Mul', [dy_mul_gamma, rsqrt_var_eps])
        sum_1_mul_two = graph_builder.emit('Mul', [sum_1, const_two])
        sum_1_mul_two_tmp = graph_builder.emit('Mul', [sum_1_mul_two, mean_cof])
--- a/mindspore/_extends/graph_kernel/expanders/sigmoid.py
+++ b/mindspore/_extends/graph_kernel/expanders/sigmoid.py
@ -0,0 +1,31 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===========================================================================
+"""generate json desc for Sigmoid"""
+from ._utils import Expander
+
+
+class Sigmoid(Expander):
+    """Sigmoid expander"""
+
+    def _expand(self, graph_builder):
+        input_x = self.inputs[0]
+        # Calculate sigmoid(x)
+        # sigmoid of x is 1 / (1 + exp(-x))
+        const_one = graph_builder.value(input_x.dtype, 1.0)
+        neg_x = graph_builder.emit('Neg', [input_x])
+        exp_neg_x = graph_builder.emit('Exp', [neg_x])
+        add_exp = graph_builder.emit('Add', [const_one, exp_neg_x])
+        res = graph_builder.emit('RealDiv', [const_one, add_exp])
+        return res
--- a/mindspore/_extends/graph_kernel/expanders/sigmoid_cross_entropy_with_logits.py
+++ b/mindspore/_extends/graph_kernel/expanders/sigmoid_cross_entropy_with_logits.py
@ -0,0 +1,40 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===========================================================================
+"""generate json desc for SigmoidCrossEntropyWithLogits"""
+from ._utils import Expander, ExpanderInfoValidator as VLD
+
+
+@VLD.check_all_formats_same
+class SigmoidCrossEntropyWithLogits(Expander):
+    """SigmoidCrossEntropyWithLogits expander"""
+
+    def _expand(self, graph_builder):
+        logits, label = self.inputs
+        # Calculate sigmoid_cross_entropy_with_logits(logits, label)
+        # formula is: -(label * log(sigmoid(logits)) + (1 - label) * log(1 - sigmoid(logits)))
+        const_one = graph_builder.value(logits.dtype, 1.0)
+        neg_x = graph_builder.emit('Neg', [logits])
+        exp_neg_x = graph_builder.emit('Exp', [neg_x])
+        add_exp = graph_builder.emit('Add', [const_one, exp_neg_x])
+        p = graph_builder.emit('RealDiv', [const_one, add_exp])
+        one_sub_p = graph_builder.emit('Sub', [const_one, p])
+        one_sub_label = graph_builder.emit('Sub', [const_one, label])
+        log_p = graph_builder.emit('Log', [p])
+        log_one_sub_p = graph_builder.emit('Log', [one_sub_p])
+        res_tmp_1 = graph_builder.emit('Mul', [one_sub_label, log_one_sub_p])
+        res_tmp_2 = graph_builder.emit('Mul', [label, log_p])
+        res_tmp = graph_builder.emit('Add', [res_tmp_1, res_tmp_2])
+        res = graph_builder.emit('Neg', [res_tmp])
+        return res
--- a/mindspore/_extends/graph_kernel/expanders/sigmoid_cross_entropy_with_logits_grad.py
+++ b/mindspore/_extends/graph_kernel/expanders/sigmoid_cross_entropy_with_logits_grad.py
@ -0,0 +1,34 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===========================================================================
+"""generate json desc for SigmoidCrossEntropyWithLogitsGrad"""
+from ._utils import Expander, ExpanderInfoValidator as VLD
+
+
+@VLD.check_all_formats_same
+class SigmoidCrossEntropyWithLogitsGrad(Expander):
+    """SigmoidCrossEntropyWithLogitsGrad expander"""
+
+    def _expand(self, graph_builder):
+        logits, label, dout = self.inputs
+        # Calculate sigmoid_cross_entropy_with_logits_grad(logits, label, dout)
+        # formula of sigmoid_cross_entropy_with_logits_grad is : (sigmoid(logits) - label) * dout
+        const_one = graph_builder.value(logits.dtype, 1.0)
+        neg_x = graph_builder.emit('Neg', [logits])
+        exp_neg_x = graph_builder.emit('Exp', [neg_x])
+        add_exp = graph_builder.emit('Add', [const_one, exp_neg_x])
+        sigmoid_res = graph_builder.emit('RealDiv', [const_one, add_exp])
+        sigmoid_res_sub_label = graph_builder.emit('Sub', [sigmoid_res, label])
+        res = graph_builder.emit('Mul', [sigmoid_res_sub_label, dout])
+        return res
--- a/mindspore/_extends/graph_kernel/expanders/sigmoid_grad.py
+++ b/mindspore/_extends/graph_kernel/expanders/sigmoid_grad.py
@ -0,0 +1,31 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===========================================================================
+"""generate json desc for SigmoidGrad"""
+from ._utils import Expander, ExpanderInfoValidator as VLD
+
+
+@VLD.check_all_formats_same
+class SigmoidGrad(Expander):
+    """SigmoidGrad expander"""
+
+    def _expand(self, graph_builder):
+        input_y, dy = self.inputs
+        # Calculate sigmoid_grad(y, dy)
+        # formula of sigmoid_grad is : (1 - y) * y * dy
+        const_one = graph_builder.value(input_y.dtype, 1.0)
+        one_mins_y = graph_builder.emit('Sub', [const_one, input_y])
+        y_mul_dy = graph_builder.emit('Mul', [input_y, dy])
+        res = graph_builder.emit('Mul', [one_mins_y, y_mul_dy])
+        return res
--- a/mindspore/_extends/graph_kernel/expanders/softmax_cross_entropy_with_logits.py
+++ b/mindspore/_extends/graph_kernel/expanders/softmax_cross_entropy_with_logits.py
@ -0,0 +1,40 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===========================================================================
+"""generate json desc for SoftmaxCrossEntropyWithLogits"""
+from mindspore._extends.graph_kernel.model.model import DataFormat as DF
+from ._utils import Expander, ExpanderInfoValidator as VLD
+
+
+@VLD.add_format(DF.DEFAULT, DF.DEFAULT)
+class SoftmaxCrossEntropyWithLogits(Expander):
+    """SoftmaxCrossEntropyWithLogits expander"""
+
+    def _expand(self, graph_builder):
+        logits, label = self.inputs
+        # Calculate softmax_cross_entropy_with_logits(logits, label)
+        # formula of softmax_cross_entropy_with_logits is : -reduce_sum(label * log(softmax(logits)))
+        axis = (-1,)
+        max_x = graph_builder.emit('ReduceMax', [logits], attrs={'reduce_axis': axis, 'keep_dims': True})
+        data_sub = graph_builder.emit('Sub', [logits, max_x])
+        data_exp = graph_builder.emit('Exp', [data_sub])
+        data_expsum = graph_builder.emit('ReduceSum', [data_exp], attrs={'reduce_axis': axis, 'keep_dims': True})
+        data_softmax = graph_builder.emit('RealDiv', [data_exp, data_expsum])
+        softmax_log = graph_builder.emit('Log', [data_softmax])
+        label_mul_log = graph_builder.emit('Mul', [label, softmax_log])
+        tmp_res = data_expsum = graph_builder.emit('ReduceSum', [label_mul_log], attrs={
+            'reduce_axis': axis, 'keep_dims': True})
+        loss = graph_builder.emit('Neg', [tmp_res])
+        dlogits = graph_builder.emit('Sub', [data_softmax, label])
+        return loss, dlogits
--- a/mindspore/_extends/graph_kernel/expanders/sqrt_grad.py
+++ b/mindspore/_extends/graph_kernel/expanders/sqrt_grad.py
@ -21,7 +21,7 @@ class SqrtGrad(Expander):
    """SqrtGrad expander"""

    def _expand(self, graph_builder):
-        # sqrt_grad(x, dout) = dout / (2 * x)
+        # formula of sqrt_grad is dout / (2 * x)
        x, dout = self.inputs
        const_two = graph_builder.value(x.dtype, 2)
        dividend = graph_builder.emit('Mul', [x, const_two])
--- a/mindspore/_extends/graph_kernel/model/model.py
+++ b/mindspore/_extends/graph_kernel/model/model.py
@ -177,7 +177,6 @@ class PrimLib:
        'ReduceMax': Prim(REDUCE),
        'ReduceMin': Prim(REDUCE),
        'MakeTuple': Prim(CONTROL),
-        'ControlDepend': Prim(CONTROL),
        'Assign': Prim(ELEMWISE),
        'Tanh': Prim(ELEMWISE),
        'ExpandDims': Prim(RESHAPE),
--- a/mindspore/_extends/parallel_compile/tbe_compiler/compiler.py
+++ b/mindspore/_extends/parallel_compile/tbe_compiler/compiler.py
@ -18,7 +18,7 @@ import os
 import sys
 from te.platform.cce_conf import te_set_version
 from te.platform.fusion_util import fusion_op
-import te
+import tbe.common.context.op_info as operator_info
 sys.path.append(os.path.abspath(os.path.dirname(__file__)))
 # pylint: disable=wrong-import-position
 from tbe_common import check_kernel_info, get_args, get_built_in_impl_path
@ -68,6 +68,7 @@ def build_op(build_type, json_str, tune_mode=None):
    check_kernel_info(kernel_info)
    te_set_version(kernel_info["op_info"]["socVersion"])
    op_name = kernel_info['op_info']['name']
+    op_type = kernel_info['op_info']['Type']

    try:
        custom_flag = False
@ -114,17 +115,19 @@ def build_op(build_type, json_str, tune_mode=None):

        # call function
        if is_dynamic_shape:
-            # with te.op.dynamic():
            import tbe.common.context.op_context as op_context
            with op_context.OpContext("dynamic"):
+                op_info = operator_info.OpInfo(op_type, op_type)
+                op_context.get_context().add_op_info(op_info)
                op_func(*inputs_args, *outputs_args, *attrs_args, kernel_name=kernel_name)
+                compile_info = op_context.get_context().get_compile_info()
                if tune_mode is not None:
-                    return (te.op.get_compile_info()), (inputs_args, outputs_args, attrs_args), op_module_name
-                return te.op.get_compile_info()
+                    return compile_info, (inputs_args, outputs_args, attrs_args), op_module_name
+                return compile_info
        else:
            res = op_func(*inputs_args, *outputs_args, *attrs_args, kernel_name=kernel_name)
            if tune_mode is not None:
-                return res, (inputs_args, outputs_args, attrs_args), op_module_name
+                return None, (inputs_args, outputs_args, attrs_args), op_module_name
            return res

    except Exception as e:
--- a/mindspore/_extends/parallel_compile/tbe_compiler/re_construct_json.py
+++ b/mindspore/_extends/parallel_compile/tbe_compiler/re_construct_json.py
@ -143,7 +143,6 @@ def single_to_fusion(json_file, tune_mode):
        "l1_size": -1,
        "op_list": ops
    }
-    # op_info = {"fusion_op": end_file}
    res = json.dumps(end_file, ensure_ascii=False)
    return res

--- a/mindspore/_extends/parallel_compile/tbe_compiler/tbe_process.py
+++ b/mindspore/_extends/parallel_compile/tbe_compiler/tbe_process.py
@ -34,6 +34,8 @@ RL_COMPILE = "RL_COMPILE"
 RL_OFFLINE = "RL_OFFLINE"
 RL_ONLINE = "RL_ONLINE"

+COMPILE_TIME_OUT_SECONDS = 600
+

 def create_tbe_parallel_process():
    """
@ -102,8 +104,8 @@ def run_compiler(op_json):
    """
    try:
        tbe_compiler = os.path.join(os.path.split(os.path.realpath(__file__))[0], "compiler.py")
-        completed_object = subprocess.run([sys.executable, tbe_compiler], input=op_json, timeout=300,
-                                          text=True, capture_output=True, check=True)
+        completed_object = subprocess.run([sys.executable, tbe_compiler], input=op_json,
+                                          timeout=COMPILE_TIME_OUT_SECONDS, text=True, capture_output=True, check=True)
        return "Success", completed_object.stderr
    except subprocess.TimeoutExpired:
        tb = traceback.format_exc()
@ -163,7 +165,7 @@ class TbeProcess:
                res = "TBEException", \
                      "ERROR: [MS_BUILD_PROCESS_NUM] should be in range(1, 25), but got : " + str(process_num)
        elif not process_num.isdigit():
-            res = "TBEException", "ERROR: [MS_BUILD_PROCESS_NUM] type should be a int num, but got :" + process_num
+            res = "TBEException", "ERROR: [MS_BUILD_PROCESS_NUM] type should be an int num, but got :" + process_num
        return res

    def init_auto_tune_env(self, tune_mode):
@ -331,6 +333,8 @@ class TbeProcess:

            if tune_mode == RL_TUNE:
                ret, job_type, compile_info = self.__tuner.rl_tune(task_id, op_json)
+                if isinstance(compile_info, dict):
+                    compile_info = json.dumps(compile_info)
                if job_type is RL_OFFLINE or job_type is RL_ONLINE:
                    if not ret:
                        # offline and online hit will return false
@ -361,7 +365,7 @@ class TbeProcess:
        ret = 0, "Failed", "Failed"
        if self.__running_tasks:
            task_id, task_future = self.__running_tasks.pop(0)
-            ret_type, result = task_future.get(330)
+            ret_type, result = task_future.get(COMPILE_TIME_OUT_SECONDS)
            if ret_type == "Success":
                ret = task_id, "Success", result
            elif ret_type in ("Exception", "TBEException"):
@ -388,7 +392,7 @@ class TbeProcess:
                    for item in ret:
                        task_id = item['task_id']
                        status_code = item['status_code']
-                        compile_info = item["op_res"] if "op_res" in item else "{}"
+                        compile_info = json.dumps(item["op_res"] if "op_res" in item else None)
                        res = None
                        if status_code == 0:
                            res = task_id, "Success", compile_info
--- a/mindspore/_extends/parallel_compile/tbe_compiler/tuner.py
+++ b/mindspore/_extends/parallel_compile/tbe_compiler/tuner.py
@ -170,6 +170,23 @@ class TbeTuner:

        return soc_info

+    def check_te_log(self, te_log_level):
+        """
+        Check te log level
+        :param te_log_level:
+        :return:
+        """
+        res = True
+        if te_log_level.isdigit() and int(te_log_level) >= len(TE_LOG_LEVEL):
+            log.error(f"Invalid environment TE_LOGLEVEL, the value should be in [0, 4) if it is a digit, but got : "
+                      f"{te_log_level}")
+            res = False
+        elif te_log_level.upper() not in TE_LOG_LEVEL:
+            log.error(f"Invalid environment TE_LOGLEVEL, the value should be one of [DEBUG, INFO, WARNING, ERROR] "
+                      f"if it is a string, but got :{te_log_level}")
+            res = False
+        return res
+
    def parallel_compilation_init(self, soc_info, tune_mode, process_num):
        """
        Initialize parallel compilation framework for tuner
@ -201,14 +218,7 @@ class TbeTuner:
            os.environ["TE_LOGLEVEL"] = TE_LOG_LEVEL[2]
            global_loglevel = 3
        else:
-            # pylint: disable=no-else-return
-            if te_log_level.isdigit() and int(te_log_level) >= len(TE_LOG_LEVEL):
-                log.error(f"Invalid environment TE_LOGLEVEL, the value should be in [0, 4) if it is a digit, but got : "
-                          f"{te_log_level}")
-                return False
-            elif te_log_level.upper() not in TE_LOG_LEVEL:
-                log.error(f"Invalid environment TE_LOGLEVEL, the value should be one of [DEBUG, INFO, WARNING, ERROR] "
-                          f"if it is a string, but got :{te_log_level}")
+            if not self.check_te_log(te_log_level):
                return False
            global_loglevel = int(te_log_level) if te_log_level.isdigit() else TE_LOG_LEVEL.index(te_log_level.upper())
        ret = init_multi_process_env(embedding, soc_info, tune_mode, global_loglevel, enable_event, pid_ts)
@ -296,7 +306,7 @@ class TbeTuner:
        # todo build with build_single_op_from_c
        base_kernel = './kernel_meta/' + kernel_name + '.o'
        job_type = RL_COMPILE
-        compile_info = "{}"
+        compile_info = None
        try:
            compile_info, op_args, op_module_name = build_op(OP_BUILD, json.dumps(json_info), tune_mode)
        # pylint: disable=broad-except
@ -317,7 +327,7 @@ class TbeTuner:

        self.module_list[op_module_name] = 1
        self.fusion_need_sync += 1
-        return ret, job_type, json.dumps(compile_info)
+        return ret, job_type, compile_info

    def fusion_rl_tune(self, task_id, json_info):
        """
@ -334,6 +344,7 @@ class TbeTuner:
        converted_json = fusion_to_fusion(json.dumps(json_info), tune_mode="RL")
        job_type = RL_COMPILE
        base_kernel = './kernel_meta/' + kernel_name + '.o'
+        compile_info = None
        try:
            fusion_op(converted_json)
        # pylint: disable=broad-except
@ -341,7 +352,7 @@ class TbeTuner:
            exc_type, exc_value, _ = sys.exc_info()
            log.error(
                "exc_type:{}, exc_value:{}, exc_traceback:{}".format(exc_type, exc_value, traceback.format_exc()))
-            return False, job_type
+            return False, job_type, compile_info
        if self.offline_tune:
            job_type = RL_OFFLINE
            dump_fusion_json(converted_json, self.offline_dump_path)
@ -351,7 +362,7 @@ class TbeTuner:
        l1size = 0
        ret = dispatch_fusion_tune_task(graph_id, task_id, l1size, base_kernel, kernel_name, full_name,
                                        converted_json)
-        return ret, job_type
+        return ret, job_type, compile_info

    def fusion_ga_tune(self, task_id, json_info):
        """
--- a/mindspore/_extends/parse/init.py
+++ b/mindspore/_extends/parse/init.py
@ -21,13 +21,16 @@ from .parser import (Parser, create_obj_instance, generate_scope,
                     get_class_member_namespace_symbol, create_slice_obj,
                     get_dataclass_attributes, get_dataclass_methods, get_obj_id,
                     get_module_namespace, get_obj_type, get_object_key,
+                     get_ast_type, get_node_type, get_args, get_args_default_values,
+                     get_ast_namespace_symbol, get_operation_namespace_symbol,
                     get_parse_method_of_class, get_scope_name, expand_expr_statement,
                     is_class_member, parse_cb, resolve_symbol, convert_to_ms_tensor, get_object_description)
 from .serialize import *

 __all__ = ['parse_cb', 'get_parse_method_of_class', 'get_bprop_method_of_class', 'resolve_symbol',
-           'get_object_key', 'get_class_instance_type', 'is_class_member',
-           'get_obj_type', 'get_obj_id', 'create_obj_instance', 'get_module_namespace',
+           'get_object_key', 'get_class_instance_type', 'is_class_member', 'get_ast_type', 'get_node_type',
+           'get_args_default_values', 'get_ast_namespace_symbol', 'get_operation_namespace_symbol',
+           'get_args', 'get_obj_type', 'get_obj_id', 'create_obj_instance', 'get_module_namespace',
           'get_class_member_namespace_symbol', 'get_obj_id', 'Parser', 'get_dataclass_attributes',
           'get_dataclass_methods', 'dump_obj', 'load_obj', 'get_dataclass_methods', 'get_scope_name',
           'create_slice_obj', 'convert_to_ms_tensor', 'get_object_description', 'expand_expr_statement']
--- a/mindspore/_extends/parse/parser.py
+++ b/mindspore/_extends/parse/parser.py
@ -371,6 +371,89 @@ def expand_expr_statement(node):
    return (False,)


+def get_ast_namespace_symbol(obj):
+    """Get obj type and namespace and symbol."""
+    # step 1:get symbol from object map
+    ops_info = parse_object_map.get(type(obj), SYMBOL_UNDEFINE)
+    logger.debug("ops info = %r", ops_info)
+    return ops_info
+
+
+def get_operation_namespace_symbol(var: str):
+    """Get operation namespace and symbol."""
+    ops_info = (trope_ns, var)
+    logger.debug("get operation ops info = %r", ops_info)
+    return ops_info
+
+
+def get_ast_type(node):
+    """Get the ast type."""
+    ast_type = AST_SUB_TYPE_UNKNOWN
+    if isinstance(node, ast.And):
+        ast_type = AST_SUB_TYPE_AND
+    elif isinstance(node, ast.Or):
+        ast_type = AST_SUB_TYPE_OR
+    elif isinstance(node, ast.Name):
+        ast_type = AST_SUB_TYPE_NAME
+    elif isinstance(node, ast.Tuple):
+        ast_type = AST_SUB_TYPE_TUPLE
+    elif isinstance(node, ast.Subscript):
+        ast_type = AST_SUB_TYPE_SUBSCRIPT
+    elif isinstance(node, ast.Starred):
+        ast_type = AST_SUB_TYPE_STARRED
+    elif isinstance(node, ast.Attribute):
+        ast_type = AST_SUB_TYPE_ATTRIBUTE
+    else:
+        ast_type = AST_SUB_TYPE_UNKNOWN
+    return ast_type
+
+
+def get_node_type(node):
+    """Process an ast node."""
+    method_name = f'{node.__class__.__name__}'
+    node_type = [method_name]
+    # judge the ast main type
+    if isinstance(node, ast.stmt):
+        node_type.append(AST_MAIN_TYPE_STMT)
+    elif isinstance(node, (ast.expr, ast.slice)) or node is None:
+        # ast.slice and ast.expr should be expr
+        node_type.append(AST_MAIN_TYPE_EXPR)
+    else:
+        node_type.append(AST_MAIN_TYPE_UNKNOWN)
+    return node_type
+
+
+def get_args_default_values(node):
+    """get the args'default values of parse object."""
+    nondefaults = [None] * (len(node.args.args) - len(node.args.defaults))
+    defaults = nondefaults + node.args.defaults + node.args.kw_defaults
+    if node.args.vararg:
+        defaults.append(None)
+    if node.args.kwarg:
+        defaults.append(None)
+    return defaults
+
+
+def get_args(node):
+    """Get the arg of parse object."""
+    args = []
+    # process position args
+    for arg in node.args.args:
+        args.append(arg)
+
+    # process kwonlyargs: kwonlyargs is append after position args
+    if node.args.kwonlyargs:
+        for kwarg in node.args.kwonlyargs:
+            args.append(kwarg)
+    # process vararg: vararg is append after kwonlyargs
+    if node.args.vararg:
+        args.append(node.args.vararg)
+    # process kwarg: kwarg is append after vararg
+    if node.args.kwarg:
+        args.append(node.args.kwarg)
+    return args
+
+
 class Parser:
    """
    Parser python code to ast tree.
@ -416,102 +499,28 @@ class Parser:
                    idt_err.filename = self.filename
                    idt_err.lineno = self.line_offset
                    idt_err.msg = f"There are incorrect indentations in definition or comment of function: " \
-                                 f"'{self.fn.__qualname__}'."
+                                  f"'{self.fn.__qualname__}'."
                    raise idt_err
                Parser.ast_cache[hexstr] = tree
        else:
            logger.error("Fn type is invalid")
        return tree

-    def get_args(self, node):
-        """Get the arg of parse object."""
-        args = []
-        # process position args
-        for arg in node.args.args:
-            args.append(arg)
-
-        # process kwonlyargs: kwonlyargs is append after position args
-        if node.args.kwonlyargs:
-            for kwarg in node.args.kwonlyargs:
-                args.append(kwarg)
-        # process vararg: vararg is append after kwonlyargs
-        if node.args.vararg:
-            args.append(node.args.vararg)
-        # process kwarg: kwarg is append after vararg
-        if node.args.kwarg:
-            args.append(node.args.kwarg)
-        return args
-
-    def get_args_default_values(self, node):
-        """get the args'default values of parse object."""
-        nondefaults = [None] * (len(node.args.args) - len(node.args.defaults))
-        defaults = nondefaults + node.args.defaults + node.args.kw_defaults
-        if node.args.vararg:
-            defaults.append(None)
-        if node.args.kwarg:
-            defaults.append(None)
-        return defaults
-
-    def get_node_type(self, node):
-        """Process an ast node."""
-        method_name = f'{node.__class__.__name__}'
-        node_type = [method_name]
-        # judge the ast main type
-        if isinstance(node, ast.stmt):
-            node_type.append(AST_MAIN_TYPE_STMT)
-        elif isinstance(node, (ast.expr, ast.slice)) or node is None:
-            # ast.slice and ast.expr should be expr
-            node_type.append(AST_MAIN_TYPE_EXPR)
-        else:
-            node_type.append(AST_MAIN_TYPE_UNKNOWN)
-        return node_type
-
-    def get_ast_type(self, node):
-        """Get the ast type."""
-        ast_type = AST_SUB_TYPE_UNKNOWN
-        if isinstance(node, ast.And):
-            ast_type = AST_SUB_TYPE_AND
-        elif isinstance(node, ast.Or):
-            ast_type = AST_SUB_TYPE_OR
-        elif isinstance(node, ast.Name):
-            ast_type = AST_SUB_TYPE_NAME
-        elif isinstance(node, ast.Tuple):
-            ast_type = AST_SUB_TYPE_TUPLE
-        elif isinstance(node, ast.Subscript):
-            ast_type = AST_SUB_TYPE_SUBSCRIPT
-        elif isinstance(node, ast.Starred):
-            ast_type = AST_SUB_TYPE_STARRED
-        elif isinstance(node, ast.Attribute):
-            ast_type = AST_SUB_TYPE_ATTRIBUTE
-        else:
-            ast_type = AST_SUB_TYPE_UNKNOWN
-        return ast_type
-
    def get_namespace_symbol(self, var: str):
+
        """Get symbol type and namespace and symbol."""
        if var in self.closure_namespace:
-            ops_info = (self.closure_namespace, var)
            logger.debug("in closure_namespace")
-        elif var in self.global_namespace:
-            ops_info = (self.global_namespace, var)
+            return self.closure_namespace, var
+        if var in self.global_namespace:
            logger.debug("in global_namespace")
-        else:
-            ops_info = parse_object_map.get(SYMBOL_UNDEFINE)
-            ops_info = [ops_info[0], var]
-        return ops_info
-
-    def get_operation_namespace_symbol(self, var: str):
-        """Get operation namespace and symbol."""
-        ops_info = (trope_ns, var)
-        logger.debug("get operation ops info = %r", ops_info)
-        return ops_info
-
-    def get_ast_namespace_symbol(self, obj):
-        """Get obj type and namespace and symbol."""
-        # step 1:get symbol from object map
-        ops_info = parse_object_map.get(type(obj), SYMBOL_UNDEFINE)
-        logger.debug("ops info = %r", ops_info)
-        return ops_info
+            value = self.global_namespace[var]
+            if isinstance(value, type(abs)) and self.global_namespace[var] not in convert_object_map:
+                error_info = f"The builtin function '{var}' is not supported in graph mode."
+                return None, var, error_info
+            return self.global_namespace, var
+        error_info = f"The name '{var}' is not defined."
+        return None, var, error_info

    def analyze_super(self, class_type_node, subclass_instance):
        """Analyze super and return a class instance."""
--- a/mindspore/ccsrc/CMakeLists.txt
+++ b/mindspore/ccsrc/CMakeLists.txt
@ -214,7 +214,7 @@ set(SUB_COMP
        frontend/operator
        pipeline/jit
        pipeline/pynative
-        common debug pybind_api utils vm profiler ps mindquantum
+        common debug pybind_api utils vm profiler ps
 )

 foreach(_comp ${SUB_COMP})
--- a/mindspore/ccsrc/backend/kernel_compiler/CMakeLists.txt
+++ b/mindspore/ccsrc/backend/kernel_compiler/CMakeLists.txt
@ -53,7 +53,7 @@ if(ENABLE_CPU)
        set_property(SOURCE ${QUANTUM_SRC_LIST} PROPERTY COMPILE_DEFINITIONS
            SUBMODULE_ID=mindspore::SubModuleId::SM_MINDQUANTUM)
        set_property(SOURCE ${QUANTUM_SRC_LIST} PROPERTY COMPILE_DEFINITIONS INTRIN)
-        set_property(SOURCE ${QUANTUM_SRC_LIST} PROPERTY COMPILE_OPTIONS -fopenmp -march=native -ffast-math)
+        set_property(SOURCE ${QUANTUM_SRC_LIST} PROPERTY COMPILE_OPTIONS -fopenmp -mavx -ffast-math)
    else()
        message("not compiled quantum kernel_compiler")
        set(QUANTUM_SRC_LIST "")
--- a/mindspore/ccsrc/backend/kernel_compiler/akg/akg_kernel_build.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/akg/akg_kernel_build.cc
@ -102,7 +102,8 @@ bool AkgKernelBuilder::AkgOpParallelBuild(const std::vector<JsonNodePair> &build
    return true;
  }

-  kernel::KernelBuildClient *client = GetClient();
+  auto client = GetClient();
+  MS_EXCEPTION_IF_NULL(client);
  if (!client->AkgStart(PROCESS_NUM, TIME_OUT)) {
    MS_LOG(ERROR) << "Akg start failed.";
    return false;
--- a/mindspore/ccsrc/backend/kernel_compiler/akg/akg_kernel_build.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/akg/akg_kernel_build.h
@ -50,7 +50,6 @@ class AkgKernelBuilder {
  bool AkgOpParallelBuild(const std::vector<JsonNodePair> &build_args);
  std::vector<JsonNodePair> repeat_nodes_;
 };
-
 }  // namespace kernel
 }  // namespace mindspore

--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/arithmetic_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/arithmetic_cpu_kernel.cc
@ -76,15 +76,16 @@ void ArithmeticCPUKernel::RealDiv(const T *input1, const T *input2, T *out, size
      GenIndex(i, &idx);
      auto dividend = input1[idx[0]];
      auto divisor = input2[idx[1]];
-      if (divisor == 0) {
-        if (dividend == 0) {
+      auto zero = (T)0;
+      if (divisor == zero) {
+        if (dividend == zero) {
          out[i] = std::numeric_limits<T>::quiet_NaN();
          continue;
        }
        if (std::numeric_limits<T>::has_infinity) {
-          out[i] = dividend > 0 ? std::numeric_limits<T>::infinity() : -std::numeric_limits<T>::infinity();
+          out[i] = dividend > zero ? std::numeric_limits<T>::infinity() : -std::numeric_limits<T>::infinity();
        } else {
-          out[i] = dividend > 0 ? std::numeric_limits<T>::max() : std::numeric_limits<T>::min();
+          out[i] = dividend > zero ? std::numeric_limits<T>::max() : std::numeric_limits<T>::min();
        }
        continue;
      }
@ -102,15 +103,16 @@ void ArithmeticCPUKernel::Div(const T *input1, const T *input2, T *out, size_t s
      GenIndex(i, &idx);
      auto dividend = input1[idx[0]];
      auto divisor = input2[idx[1]];
-      if (divisor == 0) {
-        if (dividend == 0) {
+      auto zero = (T)0;
+      if (divisor == zero) {
+        if (dividend == zero) {
          out[i] = std::numeric_limits<T>::quiet_NaN();
          continue;
        }
        if (std::numeric_limits<T>::has_infinity) {
-          out[i] = dividend > 0 ? std::numeric_limits<T>::infinity() : -std::numeric_limits<T>::infinity();
+          out[i] = dividend > zero ? std::numeric_limits<T>::infinity() : -std::numeric_limits<T>::infinity();
        } else {
-          out[i] = dividend > 0 ? std::numeric_limits<T>::max() : std::numeric_limits<T>::min();
+          out[i] = dividend > zero ? std::numeric_limits<T>::max() : std::numeric_limits<T>::min();
        }
        continue;
      }
@ -128,19 +130,20 @@ void ArithmeticCPUKernel::FloorDiv(const T *input1, const T *input2, T *out, siz
      GenIndex(i, &idx);
      auto dividend = input1[idx[0]];
      auto divisor = input2[idx[1]];
-      if (divisor == 0) {
-        if (dividend == 0) {
+      auto zero = (T)0;
+      if (divisor == zero) {
+        if (dividend == zero) {
          out[i] = std::numeric_limits<T>::quiet_NaN();
          continue;
        }
        if (std::numeric_limits<T>::has_infinity) {
-          out[i] = dividend > 0 ? std::numeric_limits<T>::infinity() : -std::numeric_limits<T>::infinity();
+          out[i] = dividend > zero ? std::numeric_limits<T>::infinity() : -std::numeric_limits<T>::infinity();
        } else {
-          out[i] = dividend > 0 ? std::numeric_limits<T>::max() : std::numeric_limits<T>::min();
+          out[i] = dividend > zero ? std::numeric_limits<T>::max() : std::numeric_limits<T>::min();
        }
        continue;
      }
-      out[i] = floor(dividend / divisor);
+      out[i] = (T)floor(static_cast<double>(dividend) / static_cast<double>(divisor));
    }
  };
  CPUKernelUtils::ParallelFor(task, size);
@ -295,7 +298,7 @@ void ArithmeticCPUKernel::Atan2(const T *input1, const T *input2, T *out, size_t
    for (size_t i = start; i < end; i++) {
      std::vector<size_t> idx;
      GenIndex(i, &idx);
-      out[i] = atan2(input1[idx[0]], input2[idx[1]]);
+      out[i] = (T)atan2(static_cast<double>(input1[idx[0]]), static_cast<double>(input2[idx[1]]));
    }
  };
  CPUKernelUtils::ParallelFor(task, size);
@ -348,8 +351,8 @@ void ArithmeticCPUKernel::InitKernel(const CNodePtr &kernel_node) {
  CPUKernelUtils::GetElementNumEveryDim(input_shape0_, &input_element_num0_);
  CPUKernelUtils::GetElementNumEveryDim(input_shape1_, &input_element_num1_);
  CPUKernelUtils::GetElementNumEveryDim(output_shape_, &output_element_num_);
-  dtype_ = AnfAlgo::GetPrevNodeOutputInferDataType(kernel_node, 0);
-  if (dtype_ != AnfAlgo::GetPrevNodeOutputInferDataType(kernel_node, 1)) {
+  dtype_ = AnfAlgo::GetInputDeviceDataType(kernel_node, 0);
+  if (dtype_ != AnfAlgo::GetInputDeviceDataType(kernel_node, 1)) {
    MS_LOG(EXCEPTION) << "Input0 and input1 must has the same data type";
  }
  target_dtype_ = AnfAlgo::GetOutputInferDataType(kernel_node, 0);
@ -358,14 +361,26 @@ void ArithmeticCPUKernel::InitKernel(const CNodePtr &kernel_node) {
 bool ArithmeticCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
                                 const std::vector<kernel::AddressPtr> & /*workspace*/,
                                 const std::vector<kernel::AddressPtr> &outputs) {
-  if (dtype_ == kNumberTypeInt32 || dtype_ == kNumberTypeInt16 || dtype_ == kNumberTypeInt8) {
+  if (dtype_ == kNumberTypeInt32) {
    LaunchKernel<int>(inputs, outputs);
-  } else if (dtype_ == kNumberTypeFloat32 || dtype_ == kNumberTypeFloat16 || dtype_ == kNumberTypeFloat64) {
+  } else if (dtype_ == kNumberTypeFloat32) {
    LaunchKernel<float>(inputs, outputs);
  } else if (dtype_ == kNumberTypeInt64) {
    LaunchKernel<int64_t>(inputs, outputs);
  } else if (dtype_ == kNumberTypeBool) {
    LaunchKernelLogic<bool>(inputs, outputs);
+  } else if (dtype_ == kNumberTypeInt8) {
+    LaunchKernel<int8_t>(inputs, outputs);
+  } else if (dtype_ == kNumberTypeInt16) {
+    LaunchKernel<int16_t>(inputs, outputs);
+  } else if (dtype_ == kNumberTypeFloat16) {
+    LaunchKernel<float16>(inputs, outputs);
+  } else if (dtype_ == kNumberTypeFloat64) {
+    LaunchKernel<double>(inputs, outputs);
+  } else if (dtype_ == kNumberTypeUInt8) {
+    LaunchKernel<uint8_t>(inputs, outputs);
+  } else if (dtype_ == kNumberTypeUInt32) {
+    LaunchKernel<uint32_t>(inputs, outputs);
  } else {
    MS_LOG(EXCEPTION) << "Data type " << TypeIdLabel(dtype_) << "is not support.";
  }
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/assign_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/assign_cpu_kernel.cc
@ -30,7 +30,6 @@ void AssignCPUKernel::InitKernel(const CNodePtr &kernel_node) {
  MS_EXCEPTION_IF_NULL(kernel_node);
  auto input_x_shape = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 0);
  auto input_y_shape = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 1);
-
  if (input_x_shape.size() != input_y_shape.size()) MS_LOG(EXCEPTION) << "x y must be same shape";
  for (size_t i = 0; i < input_x_shape.size(); ++i) {
    if (input_x_shape[i] != input_y_shape[i]) {
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/bias_add_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/bias_add_cpu_kernel.cc
@ -57,7 +57,7 @@ bool BiasAddCPUKernel::Launch(const std::vector<AddressPtr> &inputs, const std::
        size_t offset = n * c_size * hw_size + c * hw_size;
        size_t hw = 0;
 #ifdef ENABLE_AVX
-        constexpr size_t C8NUM = 8;
+        const size_t C8NUM = 8;
        size_t hw8 = hw_size / C8NUM * C8NUM;
        const float *in_ptr = src_addr + offset;
        float *out_ptr = output_addr + offset;
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/cache_swap_hashmap_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/cache_swap_hashmap_cpu_kernel.cc
@ -1,113 +0,0 @@
-/**
- * Copyright 2020 Huawei Technologies Co., Ltd
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-#include "backend/kernel_compiler/cpu/cache_swap_hashmap_cpu_kernel.h"
-#include <string>
-#include "runtime/device/cpu/cpu_device_address.h"
-
-namespace mindspore {
-namespace kernel {
-template <typename T>
-void Compress(HashmapEntry<T> *entry_p, const size_t &length, T entry) {
-  T i = (entry + 1) % length, off = 1;
-  for (; !entry_p[i].IsEmpty(); i = (i + 1) % length, off++) {
-    if (entry_p[i].tag > off) {
-      entry_p[entry].key = entry_p[i].key;
-      entry_p[entry].value = entry_p[i].value;
-      entry_p[entry].step = entry_p[i].step;
-      entry_p[entry].tag = entry_p[i].tag - off;
-      entry_p[i].SetEmpty();
-      off = 0;
-      entry = i;
-    }
-  }
-}
-
-void CacheSwapHashmapCPUKernel::InitKernel(const CNodePtr &kernel_node) {
-  auto hashmap_shape = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 0);
-  auto emb_idx_shape = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 1);
-
-  if (hashmap_shape.size() != 2) {
-    MS_LOG(EXCEPTION) << "Dimension of HashMap must be 2, (n, 4)";
-  }
-
-  for (size_t i = 0; i < emb_idx_shape.size(); ++i) {
-    batch_size_ *= emb_idx_shape[i];
-  }
-
-  hashmap_length_ = hashmap_shape[0];
-  if (hashmap_length_ <= 0) {
-    MS_LOG(EXCEPTION) << "Hashmap length must > 0";
-  }
-  dtype_ = AnfAlgo::GetPrevNodeOutputInferDataType(kernel_node, 0);
-}
-
-bool CacheSwapHashmapCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
-                                       const std::vector<kernel::AddressPtr> & /*workspace*/,
-                                       const std::vector<kernel::AddressPtr> &outputs) {
-  if (dtype_ == kNumberTypeInt32) {
-    LaunchKernel<int>(inputs, outputs);
-  } else if (dtype_ == kNumberTypeInt64) {
-    LaunchKernel<int64_t>(inputs, outputs);
-  } else {
-    MS_LOG(ERROR) << "Only support int32, int64";
-    return false;
-  }
-  return true;
-}
-
-template <typename T>
-void CacheSwapHashmapCPUKernel::LaunchKernel(const std::vector<AddressPtr> &inputs,
-                                             const std::vector<kernel::AddressPtr> &outputs) {
-  HashmapEntry<T> *hashmap = reinterpret_cast<HashmapEntry<T> *>(inputs[0]->addr);
-  auto miss_emb_idx = reinterpret_cast<T *>(inputs[1]->addr);
-  step_ = *reinterpret_cast<T *>(inputs[2]->addr);
-  auto swap_cache_idx = reinterpret_cast<T *>(outputs[0]->addr);
-  auto old_emb_idx = reinterpret_cast<T *>(outputs[1]->addr);
-
-  for (size_t i = 0; i < batch_size_; ++i) {
-    if (miss_emb_idx[i] < 0) {
-      swap_cache_idx[i] = -1;
-      old_emb_idx[i] = -1;
-    } else {
-      T emb_idx = miss_emb_idx[i];
-      T entry = HashFunc(emb_idx, hashmap_length_);
-      T tag_count = 1;
-      while (!hashmap[entry].IsEmpty()) {
-        entry = (entry + 1) % hashmap_length_;
-        tag_count++;
-      }
-
-      hashmap[entry].key = emb_idx;
-      hashmap[entry].step = step_;
-      hashmap[entry].tag = tag_count;
-
-      T tmp_entry = (entry + 1) % hashmap_length_;
-
-      while (hashmap[tmp_entry].IsEmpty() || hashmap[tmp_entry].IsUsing(step_)) {
-        tmp_entry = (tmp_entry + 1) % hashmap_length_;
-      }
-
-      swap_cache_idx[i] = hashmap[tmp_entry].value;
-      old_emb_idx[i] = hashmap[tmp_entry].key;
-      hashmap[entry].value = swap_cache_idx[i];
-      hashmap[tmp_entry].SetEmpty();
-      Compress(hashmap, hashmap_length_, tmp_entry);
-    }
-  }
-}
-}  // namespace kernel
-}  // namespace mindspore
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/cache_swap_hashmap_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/cache_swap_hashmap_cpu_kernel.h
@ -1,87 +0,0 @@
-/**
- * Copyright 2020 Huawei Technologies Co., Ltd
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_CACHE_SWAP_HASHMAP_CPU_KERNEL_H_
-#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_CACHE_SWAP_HASHMAP_CPU_KERNEL_H_
-
-#include <vector>
-#include <memory>
-#include <unordered_map>
-#include "backend/kernel_compiler/cpu/cpu_kernel.h"
-#include "backend/kernel_compiler/cpu/cpu_kernel_factory.h"
-#include "backend/kernel_compiler/cpu/search_cache_idx_cpu_kernel.h"
-
-namespace mindspore {
-namespace kernel {
-class CacheSwapHashmapCPUKernel : public CPUKernel {
- public:
-  CacheSwapHashmapCPUKernel() = default;
-  ~CacheSwapHashmapCPUKernel() override = default;
-
-  void InitKernel(const CNodePtr &kernel_node) override;
-
-  bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
-              const std::vector<AddressPtr> &outputs) override;
-
-  template <typename T>
-  void LaunchKernel(const std::vector<AddressPtr> &inputs, const std::vector<kernel::AddressPtr> &outputs);
-
- private:
-  size_t batch_size_{1};
-  size_t hashmap_length_{1};
-  int64_t step_{0};
-
-  TypeId dtype_{kTypeUnknown};
-};
-
-MS_REG_CPU_KERNEL(CacheSwapHashmap,
-                  KernelAttr()
-                    .AddInputAttr(kNumberTypeInt32)
-                    .AddInputAttr(kNumberTypeInt32)
-                    .AddInputAttr(kNumberTypeInt32)
-                    .AddOutputAttr(kNumberTypeInt32)
-                    .AddOutputAttr(kNumberTypeInt32),
-                  CacheSwapHashmapCPUKernel);
-
-MS_REG_CPU_KERNEL(CacheSwapHashmap,
-                  KernelAttr()
-                    .AddInputAttr(kNumberTypeInt64)
-                    .AddInputAttr(kNumberTypeInt64)
-                    .AddInputAttr(kNumberTypeInt32)
-                    .AddOutputAttr(kNumberTypeInt64)
-                    .AddOutputAttr(kNumberTypeInt64),
-                  CacheSwapHashmapCPUKernel);
-
-MS_REG_CPU_KERNEL(CacheSwapHashmap,
-                  KernelAttr()
-                    .AddInputAttr(kNumberTypeInt64)
-                    .AddInputAttr(kNumberTypeInt64)
-                    .AddInputAttr(kNumberTypeInt64)
-                    .AddOutputAttr(kNumberTypeInt64)
-                    .AddOutputAttr(kNumberTypeInt64),
-                  CacheSwapHashmapCPUKernel);
-
-MS_REG_CPU_KERNEL(CacheSwapHashmap,
-                  KernelAttr()
-                    .AddInputAttr(kNumberTypeInt32)
-                    .AddInputAttr(kNumberTypeInt32)
-                    .AddInputAttr(kNumberTypeInt64)
-                    .AddOutputAttr(kNumberTypeInt32)
-                    .AddOutputAttr(kNumberTypeInt32),
-                  CacheSwapHashmapCPUKernel);
-}  // namespace kernel
-}  // namespace mindspore
-
-#endif  // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_CACHE_SWAP_HASHMAP_CPU_KERNEL_H_
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/cast_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/cast_cpu_kernel.cc
@ -39,8 +39,7 @@ void CastCPUKernel<S, T>::InitKernel(const CNodePtr &kernel_node) {
 }

 template <typename S, typename T>
-bool CastCPUKernel<S, T>::Launch(const std::vector<kernel::AddressPtr> &inputs,
-                                 const std::vector<kernel::AddressPtr> & /*workspace*/,
+bool CastCPUKernel<S, T>::Launch(const std::vector<kernel::AddressPtr> &inputs, const std::vector<kernel::AddressPtr> &,
                                 const std::vector<kernel::AddressPtr> &outputs) {
  S *input = reinterpret_cast<S *>(inputs[0]->addr);
  T *output = reinterpret_cast<T *>(outputs[0]->addr);
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/cpu_kernel.cc
@ -14,6 +14,8 @@
 * limitations under the License.
 */
 #include "backend/kernel_compiler/cpu/cpu_kernel.h"
+#include <algorithm>
+#include <utility>
 #include "common/thread_pool.h"

 namespace mindspore {
@ -119,5 +121,118 @@ std::vector<size_t> CPUKernelUtils::FlatShapeByAxis(const std::vector<size_t> &s
  return flat_shape;
 }

+BroadcastIterator::BroadcastIterator(std::vector<size_t> input_shape_a, std::vector<size_t> input_shape_b,
+                                     std::vector<size_t> output_shape)
+    : input_shape_a_(std::move(input_shape_a)),
+      input_shape_b_(std::move(input_shape_b)),
+      output_shape_(std::move(output_shape)) {
+  output_dimension_ = SizeToInt(output_shape_.size());  // Assign dimension to int for iterator
+  BroadcastShape();
+  // Allocate strides memory
+  input_strides_a_.resize(output_dimension_);
+  input_strides_b_.resize(output_dimension_);
+  input_back_strides_a_.resize(output_dimension_);
+  input_back_strides_b_.resize(output_dimension_);
+  coordinates_.resize(output_dimension_);
+  InitStrides();
+}
+
+void BroadcastIterator::SetPos(size_t pos) {
+  for (int i = output_dimension_ - 1; i >= 0 && pos != 0; --i) {
+    coordinates_[i] = pos % output_shape_[i];
+    input_pos_[0] += coordinates_[i] * input_strides_a_[i];
+    input_pos_[1] += coordinates_[i] * input_strides_b_[i];
+    pos /= output_shape_[i];
+  }
+}
+
+void BroadcastIterator::GenNextPos() {
+  // Calculate output next coordinate
+  for (int i = output_dimension_ - 1; i >= 0; --i) {
+    if (coordinates_[i] + 1 == output_shape_[i]) {
+      coordinates_[i] = 0;
+      input_pos_[0] -= input_back_strides_a_[i];
+      input_pos_[1] -= input_back_strides_b_[i];
+    } else {
+      ++coordinates_[i];
+      input_pos_[0] += input_strides_a_[i];
+      input_pos_[1] += input_strides_b_[i];
+      break;
+    }
+  }
+}
+
+void BroadcastIterator::BroadcastShape() {
+  int input_dimension_a = input_shape_a_.size();
+  if (input_dimension_a < output_dimension_) {
+    input_shape_a_.insert(input_shape_a_.begin(), output_dimension_ - input_dimension_a, 1);
+  }
+
+  int input_dimension_b = input_shape_b_.size();
+  if (input_dimension_b < output_dimension_) {
+    input_shape_b_.insert(input_shape_b_.begin(), output_dimension_ - input_dimension_b, 1);
+  }
+}
+
+void BroadcastIterator::InitStrides() {
+  input_strides_a_[output_dimension_ - 1] = 1;
+  input_strides_b_[output_dimension_ - 1] = 1;
+  for (int i = output_dimension_ - 2; i >= 0; --i) {
+    input_strides_a_[i] = input_shape_a_[i + 1] * input_strides_a_[i + 1];
+    input_strides_b_[i] = input_shape_b_[i + 1] * input_strides_b_[i + 1];
+    input_back_strides_a_[i + 1] = (input_shape_a_[i + 1] - 1) * input_strides_a_[i + 1];
+    input_back_strides_b_[i + 1] = (input_shape_b_[i + 1] - 1) * input_strides_b_[i + 1];
+  }
+
+  // Update strides for broadcast
+  // While the axis value is 1, the stride is 0
+  std::transform(input_strides_a_.begin(), input_strides_a_.end(), input_shape_a_.begin(), input_strides_a_.begin(),
+                 [](const auto &a, const auto &b) { return b == 1 ? 0 : a; });
+  std::transform(input_strides_b_.begin(), input_strides_b_.end(), input_shape_b_.begin(), input_strides_b_.begin(),
+                 [](const auto &a, const auto &b) { return b == 1 ? 0 : a; });
+}
+
+TransposeIterator::TransposeIterator(std::vector<size_t> output_shape, std::vector<size_t> axes,
+                                     const std::vector<size_t> &input_shape)
+    : shape_(std::move(output_shape)), axes_(std::move(axes)) {
+  // Calculate strides
+  dimension_ = shape_.size();
+  std::vector<uint32_t> strides(dimension_, 1);
+  for (int i = dimension_ - 2; i >= 0; --i) {
+    strides[i] = input_shape[i + 1] * strides[i + 1];
+  }
+
+  // Swap shape ans strides and calculate back strides
+  strides_.resize(dimension_);
+  back_strides_.resize(dimension_);
+  for (int i = dimension_ - 1; i >= 0; --i) {
+    strides_[i] = strides[axes_[i]];
+    back_strides_[i] = (shape_[i] - 1) * strides_[i];
+  }
+
+  // Calculate coordinate by pos
+  coordinates_.resize(dimension_);
+}
+
+void TransposeIterator::SetPos(size_t pos) {
+  for (int i = dimension_ - 1; i >= 0 && pos != 0; --i) {
+    coordinates_[i] = pos % shape_[i];
+    pos_ += coordinates_[i] * strides_[i];
+    pos /= shape_[i];
+  }
+}
+
+void TransposeIterator::GenNextPos() {
+  for (int i = dimension_ - 1; i >= 0; --i) {
+    if (coordinates_[i] + 1 == shape_[i]) {
+      coordinates_[i] = 0;
+      pos_ -= back_strides_[i];
+    } else {
+      coordinates_[i]++;
+      pos_ += strides_[i];
+      break;
+    }
+  }
+}
 }  // namespace kernel
 }  // namespace mindspore
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/cpu_kernel.h
@ -145,6 +145,50 @@ class CPUKernelUtils {
  static void ParallelFor(const CTask &task, size_t count);
  static std::vector<size_t> FlatShapeByAxis(const std::vector<size_t> &shape, int axis);
 };
+
+class BroadcastIterator {
+ public:
+  BroadcastIterator(std::vector<size_t> input_shape_a, std::vector<size_t> input_shape_b,
+                    std::vector<size_t> output_shape);
+  virtual ~BroadcastIterator() = default;
+  inline size_t GetInputPosA() const { return input_pos_[0]; }
+  inline size_t GetInputPosB() const { return input_pos_[1]; }
+  void SetPos(size_t pos);
+  void GenNextPos();
+
+ private:
+  void BroadcastShape();
+  void InitStrides();
+
+  std::vector<size_t> coordinates_;
+  std::vector<size_t> input_shape_a_;
+  std::vector<size_t> input_shape_b_;
+  std::vector<size_t> output_shape_;
+  std::vector<size_t> input_strides_a_;
+  std::vector<size_t> input_strides_b_;
+  std::vector<size_t> input_back_strides_a_;
+  std::vector<size_t> input_back_strides_b_;
+  std::array<size_t, 2> input_pos_{0};
+  int output_dimension_{0};
+};
+
+class TransposeIterator {
+ public:
+  TransposeIterator(std::vector<size_t> output_shape, std::vector<size_t> axes, const std::vector<size_t> &input_shape);
+  virtual ~TransposeIterator() = default;
+  inline size_t GetPos() const { return pos_; }
+  void SetPos(size_t pos);
+  void GenNextPos();
+
+ private:
+  int dimension_{0};
+  std::vector<size_t> coordinates_;
+  std::vector<size_t> shape_;
+  std::vector<size_t> strides_;
+  std::vector<size_t> back_strides_;
+  std::vector<size_t> axes_;
+  size_t pos_{0};
+};
 }  // namespace kernel
 }  // namespace mindspore

--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/ctcloss_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/ctcloss_cpu_kernel.cc
@ -19,7 +19,6 @@

 namespace mindspore {
 namespace kernel {
-
 void CTCLossCPUKernel::InitKernel(const CNodePtr &kernel_node) {
  CheckParam(kernel_node);
  probs_shape_ = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 0);
@ -158,7 +157,6 @@ void CTCLossCPUKernel::CalculateGrad(const std::vector<uint32_t> &label_with_bla
                                     std::vector<std::vector<TT>> *dy) {
  auto dy_b = dy;
  TT kLogZero_ = -std::numeric_limits<TT>::infinity();
-
  if (log_pzx == kLogZero_) {
    MS_LOG(INFO) << "No valid path found";
    return;
@ -181,7 +179,7 @@ void CTCLossCPUKernel::CalculateGrad(const std::vector<uint32_t> &label_with_bla
  }
 }

-void CTCLossCPUKernel::GenLableWithBlank(uint32_t *seq_len, const std::vector<std::vector<uint32_t>> &batch_label,
+void CTCLossCPUKernel::GenLableWithBlank(const uint32_t *seq_len, const std::vector<std::vector<uint32_t>> &batch_label,
                                         std::vector<std::vector<uint32_t>> *label_with_blank) {
  for (size_t b = 0; b < batch_size_; ++b) {
    std::vector<uint32_t> l;
@ -216,7 +214,7 @@ void CTCLossCPUKernel::GenLableWithBlank(uint32_t *seq_len, const std::vector<st
 }

 template <typename T>
-void InnerSoftMax(T *inputs_addr, std::vector<std::vector<T>> *softmax_probs, const uint32_t sequence_length,
+void InnerSoftMax(const T *inputs_addr, std::vector<std::vector<T>> *softmax_probs, const uint32_t sequence_length,
                  size_t num_class, size_t batch_size, size_t b) {
  for (size_t t = 0; t < sequence_length; ++t) {
    T maxCoeff(T(0));
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/ctcloss_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/ctcloss_cpu_kernel.h
@ -36,7 +36,7 @@ class CTCLossCPUKernel : public CPUKernel {
  bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
              const std::vector<AddressPtr> &outputs) override;

-  void GenLableWithBlank(uint32_t *seq_len, const std::vector<std::vector<uint32_t>> &batch_label,
+  void GenLableWithBlank(const uint32_t *seq_len, const std::vector<std::vector<uint32_t>> &batch_label,
                         std::vector<std::vector<uint32_t>> *label_with_blank);

  template <typename T>
@ -87,7 +87,6 @@ MS_REG_CPU_KERNEL(CTCLoss,
                    .AddOutputAttr(kNumberTypeFloat32)
                    .AddOutputAttr(kNumberTypeFloat32),
                  CTCLossCPUKernel);
-
 }  // namespace kernel
 }  // namespace mindspore
 #endif  // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_CTCLOSS_CPU_KERNEL_H_
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/dropout_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/dropout_cpu_kernel.h
@ -54,7 +54,6 @@ MS_REG_CPU_KERNEL(
  Dropout,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
  DropoutCPUKernel);
-
 }  // namespace kernel
 }  // namespace mindspore
 #endif  // MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_DROPOUT_CPU_KERNEL_H_
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/eltwise_grad_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/eltwise_grad_cpu_kernel.cc
@ -14,15 +14,16 @@
 * limitations under the License.
 */
 #include <cmath>
-#include <string>
-#include <thread>
+#include <map>
 #include "backend/kernel_compiler/cpu/eltwise_grad_cpu_kernel.h"
+#include "common/thread_pool.h"
 #include "runtime/device/cpu/cpu_device_address.h"

 namespace mindspore {
 namespace kernel {
+
 template <typename T>
-void EltWiseGradCPUKernel::ReluGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::ReluGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    if (input2[i] > 0) {
      out[i] = input1[i];
@ -33,7 +34,7 @@ void EltWiseGradCPUKernel::ReluGrad(const T *input1, const T *input2, T *out, si
 }

 template <typename T>
-void EltWiseGradCPUKernel::ReLU6Grad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::ReLU6Grad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    if (input2[i] > 0 && input2[i] <= 6) {
      out[i] = input1[i];
@ -44,7 +45,7 @@ void EltWiseGradCPUKernel::ReLU6Grad(const T *input1, const T *input2, T *out, s
 }

 template <typename T>
-void EltWiseGradCPUKernel::AbsGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::AbsGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    if (input1[i] > 0) {
      out[i] = input2[i];
@ -57,21 +58,21 @@ void EltWiseGradCPUKernel::AbsGrad(const T *input1, const T *input2, T *out, siz
 }

 template <typename T>
-void EltWiseGradCPUKernel::SigmoidGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::SigmoidGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    out[i] = input2[i] * input1[i] * (1 - input1[i]);
  }
 }

 template <typename T>
-void EltWiseGradCPUKernel::SqrtGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::SqrtGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    out[i] = input2[i] / (input1[i] * 2);
  }
 }

 template <typename T>
-void EltWiseGradCPUKernel::TanhGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::TanhGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    T tmp = input1[i] * input1[i];
    out[i] = input2[i] * (1 - tmp);
@ -79,7 +80,7 @@ void EltWiseGradCPUKernel::TanhGrad(const T *input1, const T *input2, T *out, si
 }

 template <typename T>
-void EltWiseGradCPUKernel::GeluGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::GeluGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    T x = input2[i];
    auto double_x = static_cast<T>(x);
@ -91,7 +92,7 @@ void EltWiseGradCPUKernel::GeluGrad(const T *input1, const T *input2, T *out, si
 }

 template <typename T>
-void EltWiseGradCPUKernel::AsinGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::AsinGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    T dividend = input2[i];
    T divisor = sqrt(1 - input1[i] * input1[i]);
@ -112,7 +113,7 @@ void EltWiseGradCPUKernel::AsinGrad(const T *input1, const T *input2, T *out, si
 }

 template <typename T>
-void EltWiseGradCPUKernel::ACosGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::ACosGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    T dividend = -input2[i];
    T divisor = sqrt(1 - input1[i] * input1[i]);
@ -133,10 +134,10 @@ void EltWiseGradCPUKernel::ACosGrad(const T *input1, const T *input2, T *out, si
 }

 template <typename T>
-void EltWiseGradCPUKernel::AtanGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::AtanGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    T dividend = input2[i];
-    T divisor = 1 + input1[i] * input1[i];
+    const T divisor = 1 + input1[i] * input1[i];
    if (divisor == 0) {
      if (dividend == 0) {
        out[i] = std::numeric_limits<T>::quiet_NaN();
@ -154,7 +155,7 @@ void EltWiseGradCPUKernel::AtanGrad(const T *input1, const T *input2, T *out, si
 }

 template <typename T>
-void EltWiseGradCPUKernel::AsinhGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::AsinhGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    T dividend = input2[i];
    T divisor = sqrt(1 + input1[i] * input1[i]);
@ -175,7 +176,7 @@ void EltWiseGradCPUKernel::AsinhGrad(const T *input1, const T *input2, T *out, s
 }

 template <typename T>
-void EltWiseGradCPUKernel::AcoshGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
+void EltWiseGradCPUKernel<T>::AcoshGrad(const T *input1, const T *input2, T *out, size_t start, size_t end) {
  for (size_t i = start; i < end; i++) {
    T dividend = input2[i];
    T divisor = sqrt(input1[i] * input1[i] - 1);
@ -195,132 +196,46 @@ void EltWiseGradCPUKernel::AcoshGrad(const T *input1, const T *input2, T *out, s
  }
 }

-void EltWiseGradCPUKernel::InitKernel(const CNodePtr &kernel_node) {
+template <typename T>
+void EltWiseGradCPUKernel<T>::InitKernel(const CNodePtr &kernel_node) {
  MS_EXCEPTION_IF_NULL(kernel_node);
-  std::string kernel_name = AnfAlgo::GetCNodeName(kernel_node);
-  if (kernel_name == "ReluGrad") {
-    operate_type_ = RELUGRAD;
-  } else if (kernel_name == "ReLU6Grad") {
-    operate_type_ = RELU6GRAD;
-  } else if (kernel_name == "SigmoidGrad") {
-    operate_type_ = SIGMOIDGRAD;
-  } else if (kernel_name == "AbsGrad") {
-    operate_type_ = ABSGRAD;
-  } else if (kernel_name == "TanhGrad") {
-    operate_type_ = TANHGRAD;
-  } else if (kernel_name == "SqrtGrad") {
-    operate_type_ = SQRTGRAD;
-  } else if (kernel_name == "GeLUGrad") {
-    operate_type_ = GELUGRAD;
-  } else if (kernel_name == "AsinGrad") {
-    operate_type_ = ASINGRAD;
-  } else if (kernel_name == "ACosGrad") {
-    operate_type_ = ACOSGRAD;
-  } else if (kernel_name == "AtanGrad") {
-    operate_type_ = ATANGRAD;
-  } else if (kernel_name == "AsinhGrad") {
-    operate_type_ = ASINHGRAD;
-  } else if (kernel_name == "AcoshGrad") {
-    operate_type_ = ACOSHGRAD;
-  } else {
-    MS_LOG(EXCEPTION) << "Not support " << kernel_name;
-  }
-
-  input_shape0_ = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 0);
-  input_shape1_ = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 1);
-  output_shape_ = AnfAlgo::GetOutputInferShape(kernel_node, 0);
-  if (output_shape_.size() == 0) {
-    output_shape_.insert(output_shape_.begin(), 1);
-  }
-  size_t l = input_shape0_.size();
-  for (size_t i = 0; i < output_shape_.size() - l; ++i) {
-    input_shape0_.insert(input_shape0_.begin(), 1);
-  }
-  l = input_shape1_.size();
-  for (size_t i = 0; i < output_shape_.size() - l; ++i) {
-    input_shape1_.insert(input_shape1_.begin(), 1);
-  }
-  CPUKernelUtils::GetElementNumEveryDim(input_shape0_, &input_element_num0_);
-  CPUKernelUtils::GetElementNumEveryDim(input_shape1_, &input_element_num1_);
-  CPUKernelUtils::GetElementNumEveryDim(output_shape_, &output_element_num_);
-  dtype_ = AnfAlgo::GetPrevNodeOutputInferDataType(kernel_node, 0);
-  if (dtype_ != AnfAlgo::GetPrevNodeOutputInferDataType(kernel_node, 1)) {
-    MS_LOG(EXCEPTION) << "Input0 and input1 must has the same data type";
-  }
-}
-
-bool EltWiseGradCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
-                                  const std::vector<kernel::AddressPtr> & /*workspace*/,
-                                  const std::vector<kernel::AddressPtr> &outputs) {
-  if (dtype_ == kNumberTypeInt32 || dtype_ == kNumberTypeInt16) {
-    LaunchKernel<int>(inputs, outputs);
-  } else if (dtype_ == kNumberTypeFloat32 || dtype_ == kNumberTypeFloat16 || dtype_ == kNumberTypeFloat64) {
-    LaunchKernel<float>(inputs, outputs);
-  } else if (dtype_ == kNumberTypeInt64) {
-    LaunchKernel<int64_t>(inputs, outputs);
-  } else {
-    MS_LOG(EXCEPTION) << "Data type is " << TypeIdLabel(dtype_) << "is not support.";
-  }
-  return true;
+  kernel_name_ = AnfAlgo::GetCNodeName(kernel_node);
 }

 template <typename T>
-void EltWiseGradCPUKernel::LaunchKernel(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &outputs) {
+bool EltWiseGradCPUKernel<T>::Launch(const std::vector<kernel::AddressPtr> &inputs,
+                                     const std::vector<kernel::AddressPtr> & /*workspace*/,
+                                     const std::vector<kernel::AddressPtr> &outputs) {
+  static const std::map<std::string,
+                        std::function<void(EltWiseGradCPUKernel *, const T *, const T *, T *, size_t, size_t)>>
+    elt_map{{"ReluGrad", &EltWiseGradCPUKernel<T>::ReluGrad},       {"ReLU6Grad", &EltWiseGradCPUKernel<T>::ReLU6Grad},
+            {"SigmoidGrad", &EltWiseGradCPUKernel<T>::SigmoidGrad}, {"AbsGrad", &EltWiseGradCPUKernel<T>::AbsGrad},
+            {"TanhGrad", &EltWiseGradCPUKernel<T>::TanhGrad},       {"SqrtGrad", &EltWiseGradCPUKernel<T>::SqrtGrad},
+            {"GeLUGrad", &EltWiseGradCPUKernel<T>::GeluGrad},       {"AsinGrad", &EltWiseGradCPUKernel<T>::AsinGrad},
+            {"ACosGrad", &EltWiseGradCPUKernel<T>::ACosGrad},       {"AtanGrad", &EltWiseGradCPUKernel<T>::AtanGrad},
+            {"AsinhGrad", &EltWiseGradCPUKernel<T>::AsinhGrad},     {"AcoshGrad", &EltWiseGradCPUKernel<T>::AcoshGrad}};
  T *input1 = reinterpret_cast<T *>(inputs[0]->addr);
  T *input2 = reinterpret_cast<T *>(inputs[1]->addr);
  T *output = reinterpret_cast<T *>(outputs[0]->addr);

-  size_t lens = outputs[0]->size > 0 ? static_cast<size_t>(outputs[0]->size / sizeof(T)) : 1;
-  auto max_thread_num = std::thread::hardware_concurrency();
-  size_t thread_num = lens < 128 * max_thread_num ? std::ceil(lens / 128.0) : max_thread_num;
-  MS_LOG(INFO) << "Lens=" << lens << "; use thread_num=" << thread_num << "; max_thread_num: " << max_thread_num;
-  std::vector<std::thread> threads;
-  if (thread_num < 1) {
-    MS_LOG(ERROR) << "Invalid value: thread_num " << thread_num;
-    return;
-  }
-  threads.reserve(thread_num);
+  size_t count = outputs[0]->size > 0 ? static_cast<size_t>(outputs[0]->size / sizeof(T)) : 1;
+  auto max_thread_num = common::ThreadPool::GetInstance().GetSyncRunThreadNum();
+  const float block_size = 128.0;
+  size_t thread_num = count < block_size * max_thread_num ? std::ceil(count / block_size) : max_thread_num;
+  std::vector<common::Task> tasks;
  size_t start = 0;
-  size_t once_compute_size = (lens + thread_num - 1) / thread_num;
-  if (once_compute_size < 1) {
-    MS_LOG(ERROR) << "Invalid value: once_compute_size " << once_compute_size;
-    return;
-  }
-  while (start < lens) {
-    size_t end = (start + once_compute_size) > lens ? lens : (start + once_compute_size);
-    if (operate_type_ == RELUGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::ReluGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == RELU6GRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::ReLU6Grad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == ABSGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::AbsGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == SIGMOIDGRAD) {
-      threads.emplace_back(
-        std::thread(&EltWiseGradCPUKernel::SigmoidGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == TANHGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::TanhGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == SQRTGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::SqrtGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == GELUGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::GeluGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == ASINGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::AsinGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == ACOSGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::ACosGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == ATANGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::AtanGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == ASINHGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::AsinhGrad<T>, this, input1, input2, output, start, end));
-    } else if (operate_type_ == ACOSHGRAD) {
-      threads.emplace_back(std::thread(&EltWiseGradCPUKernel::AcoshGrad<T>, this, input1, input2, output, start, end));
-    } else {
-      MS_LOG(EXCEPTION) << "Not support " << operate_type_;
-    }
+  size_t once_compute_size = (count + thread_num - 1) / thread_num;
+  while (start < count) {
+    size_t end = (start + once_compute_size) > count ? count : (start + once_compute_size);
+    auto block = [&, start, end]() {
+      elt_map.at(kernel_name_)(this, input1, input2, output, start, end);
+      return common::SUCCESS;
+    };
+    tasks.emplace_back(block);
    start += once_compute_size;
  }
-  for (size_t i = 0; i < threads.size(); ++i) {
-    threads[i].join();
-  }
+  common::ThreadPool::GetInstance().SyncRun(tasks);
+  return true;
 }
 }  // namespace kernel
 }  // namespace mindspore
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/eltwise_grad_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/eltwise_grad_cpu_kernel.h
@ -18,11 +18,13 @@
 #include <memory>
 #include <vector>
 #include <limits>
+#include <string>
 #include "backend/kernel_compiler/cpu/cpu_kernel.h"
 #include "backend/kernel_compiler/cpu/cpu_kernel_factory.h"

 namespace mindspore {
 namespace kernel {
+template <typename T>
 class EltWiseGradCPUKernel : public CPUKernel {
 public:
  EltWiseGradCPUKernel() = default;
@ -32,95 +34,75 @@ class EltWiseGradCPUKernel : public CPUKernel {

  bool Launch(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &workspace,
              const std::vector<AddressPtr> &outputs) override;
-  template <typename T>
-  void LaunchKernel(const std::vector<AddressPtr> &inputs, const std::vector<AddressPtr> &outputs);

 private:
-  template <typename T>
  void ReluGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void ReLU6Grad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void AbsGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void SigmoidGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void SqrtGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void TanhGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void GeluGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void AsinGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void ACosGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void AtanGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void AsinhGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  template <typename T>
  void AcoshGrad(const T *input1, const T *input2, T *out, size_t start, size_t end);
-  std::vector<size_t> input_shape0_;
-  std::vector<size_t> input_shape1_;
-  std::vector<size_t> input_element_num0_;
-  std::vector<size_t> input_element_num1_;
-  std::vector<size_t> output_shape_;
-  std::vector<size_t> output_element_num_;
-  OperateType operate_type_{RELUGRAD};
-  TypeId dtype_{kTypeUnknown};
+
+  std::string kernel_name_ = "";
 };

-MS_REG_CPU_KERNEL(
+MS_REG_CPU_KERNEL_T(
  ReluGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  ReLU6Grad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  AbsGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  SigmoidGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  SqrtGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  TanhGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(GeLUGrad,
-                  KernelAttr()
-                    .AddInputAttr(kNumberTypeFloat32)
-                    .AddInputAttr(kNumberTypeFloat32)
-                    .AddInputAttr(kNumberTypeFloat32)
-                    .AddOutputAttr(kNumberTypeFloat32),
-                  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(GeLUGrad,
+                    KernelAttr()
+                      .AddInputAttr(kNumberTypeFloat32)
+                      .AddInputAttr(kNumberTypeFloat32)
+                      .AddInputAttr(kNumberTypeFloat32)
+                      .AddOutputAttr(kNumberTypeFloat32),
+                    EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  AsinGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  ACosGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  AtanGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  AsinhGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
-MS_REG_CPU_KERNEL(
+  EltWiseGradCPUKernel, float);
+MS_REG_CPU_KERNEL_T(
  AcoshGrad,
  KernelAttr().AddInputAttr(kNumberTypeFloat32).AddInputAttr(kNumberTypeFloat32).AddOutputAttr(kNumberTypeFloat32),
-  EltWiseGradCPUKernel);
+  EltWiseGradCPUKernel, float);
 }  // namespace kernel
 }  // namespace mindspore

--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/gather_d_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/gather_d_cpu_kernel.cc
@ -28,7 +28,7 @@ size_t get_element_num(const std::vector<size_t> &shape) {
 }

 template <typename T, typename I>
-void CopyTask(size_t cur, std::vector<size_t> *pos, T *input, I *index, const int &dim, T *output,
+void CopyTask(size_t cur, std::vector<size_t> *pos, T *input, const I *index, const int &dim, T *output,
              const std::vector<size_t> &output_shape, const std::vector<size_t> &out_cargo_size,
              const std::vector<size_t> &input_cargo_size, bool reverse) {
  for (size_t i = 0; i < output_shape[cur]; ++i) {
@ -65,7 +65,6 @@ template <typename T, typename I>
 void GatherDCPUKernel<T, I>::InitKernel(const CNodePtr &kernel_node) {
  input_shape_ = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
  index_shape_ = AnfAlgo::GetInputDeviceShape(kernel_node, 2);
-
  if (input_shape_.size() != index_shape_.size()) {
    MS_LOG(EXCEPTION) << "Invalid shape size, shape size of input: " << input_shape_.size()
                      << ", and index: " << index_shape_.size() << " should be equal";
@ -81,7 +80,6 @@ bool GatherDCPUKernel<T, I>::Launch(const std::vector<kernel::AddressPtr> &input
  size_t index_size = get_element_num(index_shape_) * sizeof(I);
  size_t dim_size = sizeof(int);
  size_t output_size = get_element_num(output_shape_) * sizeof(T);
-
  if (inputs[0]->size != input_size || inputs[1]->size != dim_size || inputs[2]->size != index_size ||
      outputs[0]->size != output_size) {
    MS_LOG(EXCEPTION) << "invalid input or output data size!";
@ -92,7 +90,6 @@ bool GatherDCPUKernel<T, I>::Launch(const std::vector<kernel::AddressPtr> &input
  auto index = reinterpret_cast<I *>(inputs[2]->addr);
  auto output = reinterpret_cast<T *>(outputs[0]->addr);
  int32_t input_rank = SizeToInt(input_shape_.size());
-
  if (dim[0] >= input_rank || dim[0] < -input_rank) {
    MS_LOG(EXCEPTION) << "The value of 'dim' should be in [" << -input_rank << ", " << input_rank
                      << "], but got: " << dim[0];
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/gather_d_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/gather_d_cpu_kernel.h
@ -37,7 +37,6 @@ class GatherDCPUKernel : public CPUKernel {
  std::vector<size_t> input_shape_;
  std::vector<size_t> index_shape_;
  std::vector<size_t> output_shape_;
-  int32_t axis_;
 };

 MS_REG_CPU_KERNEL_T_S(GatherD,
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/gathernd_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/gathernd_cpu_kernel.cc
@ -15,10 +15,10 @@
 */
 #include "backend/kernel_compiler/cpu/gathernd_cpu_kernel.h"
 #include "runtime/device/cpu/cpu_device_address.h"
+#define MAX_INT (((unsigned int)(-1)) >> 1)

 namespace mindspore {
 namespace kernel {
-
 void GatherNdCPUKernel::InitKernel(const CNodePtr &kernel_node) {
  input_shapes_ = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 0);
  indices_shapes_ = AnfAlgo::GetPrevNodeOutputInferShape(kernel_node, 1);
@ -83,11 +83,14 @@ bool GatherNdCPUKernel::LaunchKernel(const std::vector<AddressPtr> &inputs, cons
  size_t output_dim1 = dims_[1];
  size_t indices_dim1 = dims_[2];

-  int num = output_dim0 * output_dim1;
+  size_t num = output_dim0 * output_dim1;
+  if (num > MAX_INT) {
+    MS_LOG(EXCEPTION) << "Exceed MAX_INT: " << MAX_INT << ", dim0: " << output_dim0 << ", dim1: " << output_dim1;
+  }

-  for (int write_index = 0; write_index < num; write_index++) {
-    int i = write_index / output_dim1 % output_dim0;
-    int j = write_index % output_dim1;
+  for (size_t write_index = 0; write_index < num; write_index++) {
+    size_t i = write_index / output_dim1 % output_dim0;
+    size_t j = write_index % output_dim1;

    int read_index = 0;
    for (size_t k = 0; k < indices_dim1; k++) {
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/isfinite_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/isfinite_cpu_kernel.cc
@ -90,6 +90,5 @@ void IsFiniteCPUKernel::LaunchKernelOther(const std::vector<AddressPtr> &inputs,
    output[i] = true;
  }
 }
-
 }  // namespace kernel
 }  // namespace mindspore
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/isfinite_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/isfinite_cpu_kernel.h
@ -86,7 +86,6 @@ MS_REG_CPU_KERNEL(IsFinite, KernelAttr().AddInputAttr(kNumberTypeUInt32).AddOutp

 MS_REG_CPU_KERNEL(IsFinite, KernelAttr().AddInputAttr(kNumberTypeUInt64).AddOutputAttr(kNumberTypeBool),
                  IsFiniteCPUKernel);
-
 }  // namespace kernel
 }  // namespace mindspore

--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/maximum_grad_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/maximum_grad_cpu_kernel.cc
@ -63,7 +63,7 @@ void MaximumGradRecTask(T *x, T *y, T *dout, T *dx, T *dy, size_t dim, size_t x_
    size_t dout_i = i * dout_cargo[dim];

    if (dim == dout_shape.size() - 1) {
-      if (*(x + x_index + x_i) >= *(y + y_index + y_i)) {
+      if (*(x + x_index + x_i) > *(y + y_index + y_i)) {
        *(dx + x_index + x_i) += *(dout + dout_index + i);
      } else {
        *(dy + y_index + y_i) += *(dout + dout_index + i);
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/minimum_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/minimum_cpu_kernel.cc
@ -19,7 +19,6 @@

 namespace mindspore {
 namespace kernel {
-
 template <typename T>
 void MinimumCPUKernel<T>::InitKernel(const CNodePtr &kernel_node) {
  CheckParam(kernel_node);
@ -147,7 +146,7 @@ void MinimumCPUKernel<T>::InitTensorBroadcastShape() {
  }
 }

-// Broadcast comparation
+// Broadcast comparison
 template <typename T>
 size_t MinimumCPUKernel<T>::Index(const size_t &index, const size_t &dim) {
  return dim == 1 ? 0 : index;
@ -216,6 +215,5 @@ void MinimumCPUKernel<T>::BroadcastArithTensors(const T *input_x, const T *input
    output[i] = MinimumFunc(input_x[i], input_y[i]);
  }
 }
-
 }  // namespace kernel
 }  // namespace mindspore
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/minimum_grad_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/minimum_grad_cpu_kernel.cc
@ -83,10 +83,11 @@ bool MinimumGradCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
 }

 template <typename T>
-void MinimumGradRecTask(T *x, T *y, T *dout, T *dx, T *dy, size_t dim, size_t x_index, size_t y_index,
-                        size_t dout_index, const std::vector<size_t> &x_cargo, const std::vector<size_t> &y_cargo,
-                        const std::vector<size_t> &dout_cargo, const std::vector<size_t> &x_shape,
-                        const std::vector<size_t> &y_shape, const std::vector<size_t> &dout_shape) {
+void MinimumGradRecTask(const T *x, const T *y, const T *dout, T *dx, T *dy, const size_t dim, const size_t x_index,
+                        const size_t y_index, const size_t dout_index, const std::vector<size_t> &x_cargo,
+                        const std::vector<size_t> &y_cargo, const std::vector<size_t> &dout_cargo,
+                        const std::vector<size_t> &x_shape, const std::vector<size_t> &y_shape,
+                        const std::vector<size_t> &dout_shape) {
  for (size_t i = 0; i < dout_shape[dim]; i++) {
    size_t x_i = x_shape[dim] == dout_shape[dim] ? i * x_cargo[dim] : 0;
    size_t y_i = y_shape[dim] == dout_shape[dim] ? i * y_cargo[dim] : 0;
@ -115,8 +116,8 @@ void MinimumGradCPUKernel::LaunchKernel(const std::vector<AddressPtr> &inputs, c

  size_t x_tensor_len = GetTensorLen(x_shape_);
  size_t y_tensor_len = GetTensorLen(y_shape_);
-  memset(dx_addr, 0, x_tensor_len * sizeof(T));
-  memset(dy_addr, 0, y_tensor_len * sizeof(T));
+  memset_s(dx_addr, x_tensor_len * sizeof(T), 0x00, x_tensor_len * sizeof(T));
+  memset_s(dy_addr, y_tensor_len * sizeof(T), 0x00, y_tensor_len * sizeof(T));

  std::vector<size_t> x_shape(dout_shape.size(), 1);
  std::vector<size_t> y_shape(dout_shape.size(), 1);
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/mirror_pad_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/mirror_pad_cpu_kernel.cc
@ -187,6 +187,5 @@ void MirrorPadCPUKernel::CheckParam(const CNodePtr &kernel_node) {
    MS_LOG(EXCEPTION) << "Output number is " << output_num << ", but MirrorPadCPUKernel needs 1 output.";
  }
 }
-
 }  // namespace kernel
 }  // namespace mindspore
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/mirror_pad_grad_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/mirror_pad_grad_cpu_kernel.cc
@ -136,7 +136,7 @@ void MirrorPadGradCPUKernel::InitInputOutputSize(const CNodePtr &kernel_node) {
 }

 template <typename T>
-void MirrorPadGradCPUKernel::MirrorPadGrad_Width_Height(const size_t size, const T *dy, T *interim_dy,
+void MirrorPadGradCPUKernel::MirrorPadGrad_Width_Height(const size_t size, const T *dy, const T *interim_dy,
                                                        const int dx_batches, const int dx_channels,
                                                        const int dx_height, const int dx_width, const int dy_height,
                                                        const int dy_width, const int padd_dim,
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/mirror_pad_grad_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/mirror_pad_grad_cpu_kernel.h
@ -58,14 +58,14 @@ class MirrorPadGradCPUKernel : public CPUKernel {
                    const std::vector<AddressPtr> &outputs);

  template <typename T>
-  void MirrorPadGrad_Width_Height(const size_t size, const T *dy, T *interim_dy, const int dx_batches,
+  void MirrorPadGrad_Width_Height(const size_t size, const T *dy, const T *interim_dy, const int dx_batches,
                                  const int dx_channels, const int dx_height, const int dx_width, const int dy_height,
                                  const int dy_width, const int padd_dim, const int64_t *paddings_arg, int mode, T *dx);

  template <typename T>
  void MirrorPadGradBatchChannel(const size_t size, T *dy, T *interim_dy, const int dx_batches, const int dx_channels,
                                 const int dx_height, const int dx_width, const int dy_height, const int dy_width,
-                                 const int padd_dim, const int64_t *paddings_arg, int mode, T *dx);
+                                 const int padd_dim, const int64_t *paddings_arg, int mode, T *const dx);

 private:
  void CheckParam(const CNodePtr &kernel_node);
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_cpu_kernel.cc
@ -1,5 +1,5 @@
 /**
- * Copyright 2020 Huawei Technologies Co., Ltd
+ * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
@ -14,14 +14,14 @@
 * limitations under the License.
 */
 #include <string>
-#include "backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_cpu_kernel.h"
+#include "backend/kernel_compiler/cpu/mkldnn/batch_norm_cpu_kernel.h"
 #include "backend/kernel_compiler/cpu/mkldnn/mkl_kernel_engine.h"
 #include "runtime/device/cpu/cpu_device_address.h"
 #include "utils/ms_utils.h"

 namespace mindspore {
 namespace kernel {
-void FusedBatchNormCPUKernel::InitInputOutputSize(const CNodePtr &kernel_node) {
+void BatchNormCPUKernel::InitInputOutputSize(const CNodePtr &kernel_node) {
  CPUKernel::InitInputOutputSize(kernel_node);
  MS_EXCEPTION_IF_NULL(kernel_node);
  size_t type_size = sizeof(float);
@ -30,16 +30,13 @@ void FusedBatchNormCPUKernel::InitInputOutputSize(const CNodePtr &kernel_node) {
  workspace_size_list_.emplace_back(tensor_size);
 }

-void FusedBatchNormCPUKernel::InitKernel(const CNodePtr &kernel_node) {
+void BatchNormCPUKernel::InitKernel(const CNodePtr &kernel_node) {
  MS_EXCEPTION_IF_NULL(kernel_node);
-  auto node_name = AnfAlgo::GetCNodeName(kernel_node);
-  if (node_name == "FusedBatchNorm") {
-    momentum = AnfAlgo::GetNodeAttr<float>(kernel_node, "momentum");
-    is_train = true;
-  }
+  is_train = AnfAlgo::GetNodeAttr<bool>(kernel_node, "is_training");
+  momentum = AnfAlgo::GetNodeAttr<float>(kernel_node, "momentum");
  std::vector<size_t> x_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
  if (x_shape.size() != 4) {
-    MS_LOG(EXCEPTION) << "Fused batchnorm only support nchw input!";
+    MS_LOG(EXCEPTION) << "Batchnorm only support nchw input!";
  }
  batch_size = x_shape[0];
  channel = x_shape[1];
@ -66,9 +63,9 @@ void FusedBatchNormCPUKernel::InitKernel(const CNodePtr &kernel_node) {
  AddArgument(DNNL_ARG_DST, x_desc);
 }

-bool FusedBatchNormCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
-                                     const std::vector<kernel::AddressPtr> &workspace,
-                                     const std::vector<kernel::AddressPtr> &outputs) {
+bool BatchNormCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
+                                const std::vector<kernel::AddressPtr> &workspace,
+                                const std::vector<kernel::AddressPtr> &outputs) {
  if (inputs.size() < 5 || outputs.empty()) {
    MS_LOG(EXCEPTION) << "Error input output size!";
  }
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_cpu_kernel.h
@ -1,5 +1,5 @@
 /**
- * Copyright 2020 Huawei Technologies Co., Ltd
+ * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
@ -13,18 +13,18 @@
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_CPU_KERNEL_H_
-#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_CPU_KERNEL_H_
+#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_BATCH_NORM_CPU_KERNEL_H_
+#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_BATCH_NORM_CPU_KERNEL_H_
 #include <memory>
 #include <vector>
 #include "backend/kernel_compiler/cpu/mkldnn/mkl_cpu_kernel.h"

 namespace mindspore {
 namespace kernel {
-class FusedBatchNormCPUKernel : public MKLCPUKernel {
+class BatchNormCPUKernel : public MKLCPUKernel {
 public:
-  FusedBatchNormCPUKernel() = default;
-  ~FusedBatchNormCPUKernel() override = default;
+  BatchNormCPUKernel() = default;
+  ~BatchNormCPUKernel() override = default;

  void InitKernel(const CNodePtr &kernel_node) override;

@ -43,20 +43,6 @@ class FusedBatchNormCPUKernel : public MKLCPUKernel {
  size_t nhw_size{0};
 };

-MS_REG_CPU_KERNEL(FusedBatchNorm,
-                  KernelAttr()
-                    .AddInputAttr(kNumberTypeFloat32)
-                    .AddInputAttr(kNumberTypeFloat32)
-                    .AddInputAttr(kNumberTypeFloat32)
-                    .AddInputAttr(kNumberTypeFloat32)
-                    .AddInputAttr(kNumberTypeFloat32)
-                    .AddOutputAttr(kNumberTypeFloat32)
-                    .AddOutputAttr(kNumberTypeFloat32)
-                    .AddOutputAttr(kNumberTypeFloat32)
-                    .AddOutputAttr(kNumberTypeFloat32)
-                    .AddOutputAttr(kNumberTypeFloat32),
-                  FusedBatchNormCPUKernel)
-
 MS_REG_CPU_KERNEL(BatchNorm,
                  KernelAttr()
                    .AddInputAttr(kNumberTypeFloat32)
@ -69,7 +55,7 @@ MS_REG_CPU_KERNEL(BatchNorm,
                    .AddOutputAttr(kNumberTypeFloat32)
                    .AddOutputAttr(kNumberTypeFloat32)
                    .AddOutputAttr(kNumberTypeFloat32),
-                  FusedBatchNormCPUKernel)
+                  BatchNormCPUKernel)
 }  // namespace kernel
 }  // namespace mindspore

--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_gard_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_gard_cpu_kernel.cc
@ -1,5 +1,5 @@
 /**
- * Copyright 2020 Huawei Technologies Co., Ltd
+ * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
@ -13,7 +13,7 @@
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-#include "backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_gard_cpu_kernel.h"
+#include "backend/kernel_compiler/cpu/mkldnn/batch_norm_gard_cpu_kernel.h"

 #include <string>
 #include "backend/kernel_compiler/cpu/mkldnn/mkl_kernel_engine.h"
@ -22,19 +22,20 @@

 namespace mindspore {
 namespace kernel {
-void FusedBatchNormGradCPUKernel::InitInputOutputSize(const CNodePtr &kernel_node) {
+void BatchNormGradCPUKernel::InitInputOutputSize(const CNodePtr &kernel_node) {
  CPUKernel::InitInputOutputSize(kernel_node);
  MS_EXCEPTION_IF_NULL(kernel_node);
  size_t type_size = sizeof(float);
  std::vector<size_t> shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
  size_t tensor_size = shape[1] * 2 * type_size;
+  input_size_list_.pop_back();
  // [2, c] to store scale and bias
  workspace_size_list_.emplace_back(tensor_size);
  // [2, c] to store diff_scale and diff_bias
  workspace_size_list_.emplace_back(tensor_size);
 }

-void FusedBatchNormGradCPUKernel::InitKernel(const CNodePtr &kernel_node) {
+void BatchNormGradCPUKernel::InitKernel(const CNodePtr &kernel_node) {
  MS_EXCEPTION_IF_NULL(kernel_node);
  std::vector<size_t> x_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
  if (x_shape.size() != 4) {
@ -72,25 +73,25 @@ void FusedBatchNormGradCPUKernel::InitKernel(const CNodePtr &kernel_node) {
  AddArgument(DNNL_ARG_DIFF_SCALE_SHIFT, scale_bias_desc);
 }

-bool FusedBatchNormGradCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
-                                         const std::vector<kernel::AddressPtr> &workspace,
-                                         const std::vector<kernel::AddressPtr> &outputs) {
+bool BatchNormGradCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
+                                    const std::vector<kernel::AddressPtr> &workspace,
+                                    const std::vector<kernel::AddressPtr> &outputs) {
  if (inputs.size() < 5 || outputs.empty()) {
    MS_LOG(EXCEPTION) << "Error input output size!";
  }
  auto wksp_in = reinterpret_cast<float *>(workspace[0]->addr);
  auto scale_ret = memcpy_s(wksp_in, workspace[0]->size, inputs[2]->addr, inputs[2]->size);
  auto max_size = workspace[0]->size - inputs[2]->size;
-  auto bias_ret = memcpy_s(wksp_in + (inputs[2]->size / sizeof(float)), max_size, inputs[3]->addr, inputs[3]->size);
-  if (scale_ret != 0 || bias_ret != 0) {
+  auto bias_ret = memset_s(wksp_in + (inputs[2]->size / sizeof(float)), max_size, 0., max_size);
+  if (scale_ret != 0 && bias_ret != 0) {
    MS_LOG(EXCEPTION) << "Memcpy_s error.";
    return false;
  }

  SetArgumentHandle(DNNL_ARG_DIFF_DST, inputs[0]->addr);
  SetArgumentHandle(DNNL_ARG_SRC, inputs[1]->addr);
-  SetArgumentHandle(DNNL_ARG_MEAN, inputs[4]->addr);
-  SetArgumentHandle(DNNL_ARG_VARIANCE, inputs[5]->addr);
+  SetArgumentHandle(DNNL_ARG_MEAN, inputs[3]->addr);
+  SetArgumentHandle(DNNL_ARG_VARIANCE, inputs[4]->addr);
  SetArgumentHandle(DNNL_ARG_SCALE_SHIFT, workspace[0]->addr);
  SetArgumentHandle(DNNL_ARG_DIFF_SRC, outputs[0]->addr);
  SetArgumentHandle(DNNL_ARG_DIFF_SCALE_SHIFT, workspace[1]->addr);
@ -99,7 +100,7 @@ bool FusedBatchNormGradCPUKernel::Launch(const std::vector<kernel::AddressPtr> &
  auto wksp_out = reinterpret_cast<float *>(workspace[1]->addr);
  auto diff_scale_ret = memcpy_s(outputs[1]->addr, outputs[1]->size, wksp_out, inputs[2]->size);
  auto diff_bias_ret =
-    memcpy_s(outputs[2]->addr, outputs[2]->size, wksp_out + (outputs[1]->size / sizeof(float)), inputs[3]->size);
+    memcpy_s(outputs[2]->addr, outputs[2]->size, wksp_out + (outputs[1]->size / sizeof(float)), outputs[2]->size);
  if (diff_scale_ret != 0 || diff_bias_ret != 0) {
    MS_LOG(EXCEPTION) << "Memcpy_s error.";
    return false;
--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_gard_cpu_kernel.h
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/fused_batch_norm_gard_cpu_kernel.h
@ -1,5 +1,5 @@
 /**
- * Copyright 2020 Huawei Technologies Co., Ltd
+ * Copyright 2021 Huawei Technologies Co., Ltd
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
@ -13,18 +13,18 @@
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_GRAD_CPU_KERNEL_H_
-#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_FUSED_BATCH_NORM_GRAD_CPU_KERNEL_H_
+#ifndef MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_BATCH_NORM_GRAD_CPU_KERNEL_H_
+#define MINDSPORE_CCSRC_BACKEND_KERNEL_COMPILER_CPU_BATCH_NORM_GRAD_CPU_KERNEL_H_
 #include <memory>
 #include <vector>
 #include "backend/kernel_compiler/cpu/mkldnn/mkl_cpu_kernel.h"

 namespace mindspore {
 namespace kernel {
-class FusedBatchNormGradCPUKernel : public MKLCPUKernel {
+class BatchNormGradCPUKernel : public MKLCPUKernel {
 public:
-  FusedBatchNormGradCPUKernel() = default;
-  ~FusedBatchNormGradCPUKernel() override = default;
+  BatchNormGradCPUKernel() = default;
+  ~BatchNormGradCPUKernel() override = default;

  void InitKernel(const CNodePtr &kernel_node) override;

@ -42,7 +42,7 @@ class FusedBatchNormGradCPUKernel : public MKLCPUKernel {
  size_t nhw_size{0};
 };

-MS_REG_CPU_KERNEL(FusedBatchNormGradCPU,
+MS_REG_CPU_KERNEL(BatchNormGrad,
                  KernelAttr()
                    .AddInputAttr(kNumberTypeFloat32)
                    .AddInputAttr(kNumberTypeFloat32)
@ -53,7 +53,7 @@ MS_REG_CPU_KERNEL(FusedBatchNormGradCPU,
                    .AddOutputAttr(kNumberTypeFloat32)
                    .AddOutputAttr(kNumberTypeFloat32)
                    .AddOutputAttr(kNumberTypeFloat32),
-                  FusedBatchNormGradCPUKernel)
+                  BatchNormGradCPUKernel)
 }  // namespace kernel
 }  // namespace mindspore

--- a/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/log_softmax_cpu_kernel.cc
+++ b/mindspore/ccsrc/backend/kernel_compiler/cpu/mkldnn/log_softmax_cpu_kernel.cc
@ -0,0 +1,55 @@
+/**
+ * Copyright 2021 Huawei Technologies Co., Ltd
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "backend/kernel_compiler/cpu/mkldnn/log_softmax_cpu_kernel.h"
+#include <algorithm>
+#include "backend/kernel_compiler/cpu/mkldnn/mkl_kernel_engine.h"
+#include "runtime/device/cpu/cpu_device_address.h"
+#include "utils/ms_utils.h"
+
+namespace mindspore {
+namespace kernel {
+void LogSoftmaxCPUKernel::InitKernel(const CNodePtr &kernel_node) {
+  MS_EXCEPTION_IF_NULL(kernel_node);
+  std::vector<size_t> src_shape = AnfAlgo::GetInputDeviceShape(kernel_node, 0);
+  int axis = AnfAlgo::GetNodeAttr<int64_t>(kernel_node, AXIS);
+  if (axis >= SizeToInt(src_shape.size())) {
+    axis = SizeToInt(src_shape.size()) - 1;
+  }
+  while (axis < 0) {
+    axis += SizeToInt(src_shape.size());
+  }
+  dnnl::memory::desc src_desc = GetDefaultMemDesc(src_shape);
+  dnnl::logsoftmax_forward::desc desc =
+    dnnl::logsoftmax_forward::desc(dnnl::prop_kind::forward_training, src_desc, axis);
+  auto prim_desc = dnnl::logsoftmax_forward::primitive_desc(desc, MKLKernelEngine::Get().engine());
+  primitive_ = std::make_shared<dnnl::logsoftmax_forward>(prim_desc);
+  AddArgument(DNNL_ARG_SRC, src_desc);
+  AddArgument(DNNL_ARG_DST, src_desc);
+}
+
+bool LogSoftmaxCPUKernel::Launch(const std::vector<kernel::AddressPtr> &inputs,
+                                 const std::vector<kernel::AddressPtr> & /*workspace*/,
+                                 const std::vector<kernel::AddressPtr> &outputs) {
+  if (inputs.empty() || outputs.empty()) {
+    MS_LOG(EXCEPTION) << "log softmax error input output size!";
+  }
+  SetArgumentHandle(DNNL_ARG_SRC, inputs[0]->addr);
+  SetArgumentHandle(DNNL_ARG_DST, outputs[0]->addr);
+  ExecutePrimitive();
+  return true;
+}
+}  // namespace kernel
+}  // namespace mindspore
--- a/Show More
+++ b/Show More