diff --git a/README.md b/README.md
index 50e21fae70d..108b800d2a3 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,9 @@
 ![MindSpore Logo](docs/MindSpore-logo.png "MindSpore logo")
 ============================================================
 
-- [What Is MindSpore?](#what-is-mindspore)
+[查看中文](./README_CN.md)
+
+- [What Is MindSpore](#what-is-mindspore)
     - [Automatic Differentiation](#automatic-differentiation)
     - [Automatic Parallel](#automatic-parallel)
 - [Installation](#installation)
diff --git a/README_CN.md b/README_CN.md
new file mode 100644
index 00000000000..68c9fcddd64
--- /dev/null
+++ b/README_CN.md
@@ -0,0 +1,220 @@
+﻿![MindSpore标志](docs/MindSpore-logo.png "MindSpore logo")
+============================================================
+
+[View English](./README.md)
+
+- [MindSpore介绍](#mindspore介绍)
+    - [自动微分](#自动微分)
+    - [自动并行](#自动并行)
+- [安装](#安装)
+    - [二进制文件](#二进制文件)
+    - [来源](#来源)
+    - [Docker镜像](#docker镜像)
+- [快速入门](#快速入门)
+- [文档](#文档)
+- [社区](#社区)
+    - [治理](#治理)
+    - [交流](#交流)
+- [贡献](#贡献)
+- [版本说明](#版本说明)
+- [许可证](#许可证)
+
+## MindSpore介绍
+
+MindSpore是一种适用于端边云场景的新型开源深度学习训练/推理框架。
+MindSpore提供了友好的设计和高效的执行，旨在提升数据科学家和算法工程师的开发体验，并为Ascend AI处理器提供原生支持，以及软硬件协同优化。
+
+
+同时，MindSpore作为全球AI开源社区，致力于进一步开发和丰富AI软硬件应用生态。
+
+
+
+<img src="docs/MindSpore-architecture.png" alt="MindSpore Architecture" width="600"/>
+
+欲了解更多详情，请查看我们的[总体架构](https://www.mindspore.cn/docs/zh-CN/master/architecture.html)。
+
+### 自动微分
+
+当前主流深度学习框架中有三种自动微分技术：
+
+- **基于静态计算图的转换**：编译时将网络转换为静态数据流图，将链式法则应用于数据流图，实现自动微分。
+- **基于动态计算图的转换**：记录算子过载正向执行时网络的运行轨迹，对动态生成的数据流图应用链式法则，实现自动微分。
+- **基于源码的转换**：该技术是从功能编程框架演进而来，以即时编译（Just-in-time Compilation，JIT）的形式对中间表达式（程序在编译过程中的表达式）进行自动差分转换，支持复杂的控制流场景、高阶函数和闭包。
+
+TensorFlow早期采用的是静态计算图，PyTorch采用的是动态计算图。静态映射可以利用静态编译技术来优化网络性能，但是构建网络或调试网络非常复杂。动态图的使用非常方便，但很难实现性能的极限优化。
+
+MindSpore找到了另一种方法，即基于源代码转换的自动微分。一方面，它支持自动控制流的自动微分，因此像PyTorch这样的模型构建非常方便。另一方面，MindSpore可以对神经网络进行静态编译优化，以获得更好的性能。
+
+<img src="docs/Automatic-differentiation.png" alt="Automatic Differentiation" width="600"/>
+
+MindSpore自动微分的实现可以理解为程序本身的符号微分。MindSpore IR是一个函数中间表达式，它与基础代数中的复合函数具有直观的对应关系。复合函数的公式由任意可推导的基础函数组成。MindSpore IR中的每个原语操作都可以对应基础代数中的基本功能，从而可以建立更复杂的流控制。
+
+### 自动并行
+
+MindSpore自动并行的目的是构建数据并行、模型并行和混合并行相结合的训练方法。该方法能够自动选择开销最小的模型切分策略，实现自动分布并行训练。
+
+<img src="docs/Automatic-parallel.png" alt="Automatic Parallel" width="600"/>
+
+目前MindSpore采用的是算子切分的细粒度并行策略，即图中的每个算子被切分为一个集群，完成并行操作。在此期间的切分策略可能非常复杂，但是作为一名Python开发者，您无需关注底层实现，只要顶层API计算是有效的即可。
+
+## 安装
+
+### 二进制文件
+
+MindSpore提供跨多个后端的构建选项：
+
+| 硬件平台          | 操作系统            | 状态   |
+| :------------ | :-------------- | :--- |
+| Ascend 910    | Ubuntu-x86      | ✔️   |
+|               | EulerOS-x86     | ✔️   |
+|               | EulerOS-aarch64 | ✔️   |
+| GPU CUDA 10.1 | Ubuntu-x86      | ✔️   |
+| CPU           | Ubuntu-x86      | ✔️   |
+|               | Windows-x86     | ✔️   |
+
+使用`pip`命令安装，以`CPU`和`Ubuntu-x86`build版本为例：
+
+1. 请从[MindSpore下载页面](https://www.mindspore.cn/versions)下载并安装whl包。
+
+    ```
+    pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/0.6.0-beta/MindSpore/cpu/ubuntu_x86/mindspore-0.6.0-cp37-cp37m-linux_x86_64.whl
+    ```
+
+2. 执行以下命令，验证安装结果。
+
+    ```python
+    import numpy as np
+    import mindspore.context as context
+    import mindspore.nn as nn
+    from mindspore import Tensor
+    from mindspore.ops import operations as P
+    
+    context.set_context(mode=context.GRAPH_MODE, device_target="CPU")
+    
+    class Mul(nn.Cell):
+        def __init__(self):
+            super(Mul, self).__init__()
+            self.mul = P.Mul()
+    
+        def construct(self, x, y):
+            return self.mul(x, y)
+    
+    x = Tensor(np.array([1.0, 2.0, 3.0]).astype(np.float32))
+    y = Tensor(np.array([4.0, 5.0, 6.0]).astype(np.float32))
+    
+    mul = Mul()
+    print(mul(x, y))
+    ```
+    ```
+    [ 4. 10. 18.]
+    ```
+### 来源
+
+[MindSpore安装](https://www.mindspore.cn/install)。
+
+### Docker镜像
+
+MindSpore的Docker镜像托管在[Docker Hub](https://hub.docker.com/r/mindspore)上。
+目前容器化构建选项支持情况如下：
+
+| 硬件平台   | Docker镜像仓库                | 标签                       | 说明                                       |
+| :----- | :------------------------ | :----------------------- | :--------------------------------------- |
+| CPU    | `mindspore/mindspore-cpu` | `x.y.z`                  | 已经预安装MindSpore `x.y.z` CPU版本的生产环境。       |
+|        |                           | `devel`                  | 提供开发环境从源头构建MindSpore（`CPU`后端）。安装详情请参考https://www.mindspore.cn/install。 |
+|        |                           | `runtime`                | 提供运行时环境安装MindSpore二进制包（`CPU`后端）。         |
+| GPU    | `mindspore/mindspore-gpu` | `x.y.z`                  | 已经预安装MindSpore `x.y.z` GPU版本的生产环境。       |
+|        |                           | `devel`                  | 提供开发环境从源头构建MindSpore（`GPU CUDA10.1`后端）。安装详情请参考https://www.mindspore.cn/install。 |
+|        |                           | `runtime`                | 提供运行时环境安装MindSpore二进制包（`GPU CUDA10.1`后端）。 |
+| Ascend | <center>&mdash;</center>  | <center>&mdash;</center> | 即将推出，敬请期待。                               |
+
+> **注意：** 不建议从源头构建GPU `devel` Docker镜像后直接安装whl包。我们强烈建议您在GPU `runtime` Docker镜像中传输并安装whl包。
+
+* CPU
+
+    对于`CPU`后端，可以直接使用以下命令获取并运行最新的稳定镜像：
+    ```
+    docker pull mindspore/mindspore-cpu:0.6.0-beta
+    docker run -it mindspore/mindspore-cpu:0.6.0-beta /bin/bash
+    ```
+
+* GPU
+
+    对于`GPU`后端，请确保`nvidia-container-toolkit`已经提前安装，以下是`Ubuntu`用户安装指南：
+    ```
+    DISTRIBUTION=$(. /etc/os-release; echo $ID$VERSION_ID)
+    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
+    curl -s -L https://nvidia.github.io/nvidia-docker/$DISTRIBUTION/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list
+
+    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-docker2
+    sudo systemctl restart docker
+    ```
+
+    使用以下命令获取并运行最新的稳定镜像：
+    ```
+    docker pull mindspore/mindspore-gpu:0.6.0-beta
+    docker run -it --runtime=nvidia --privileged=true mindspore/mindspore-gpu:0.6.0-beta /bin/bash
+    ```
+
+    要测试Docker是否正常工作，请运行下面的Python代码并检查输出：
+    ```python
+    import numpy as np
+    import mindspore.context as context
+    from mindspore import Tensor
+    from mindspore.ops import functional as F
+
+    context.set_context(device_target="GPU")
+
+    x = Tensor(np.ones([1,3,3,4]).astype(np.float32))
+    y = Tensor(np.ones([1,3,3,4]).astype(np.float32))
+    print(F.tensor_add(x, y))
+    ```
+    ```
+    [[[ 2.  2.  2.  2.],
+    [ 2.  2.  2.  2.],
+    [ 2.  2.  2.  2.]],
+
+    [[ 2.  2.  2.  2.],
+    [ 2.  2.  2.  2.],
+    [ 2.  2.  2.  2.]],
+
+    [[ 2.  2.  2.  2.],
+    [ 2.  2.  2.  2.],
+    [ 2.  2.  2.  2.]]]
+    ```
+
+如果您想了解更多关于MindSpore Docker镜像的构建过程，请查看[docker](docker/README.md) repo了解详细信息。
+
+## 快速入门
+
+参考[快速入门](https://www.mindspore.cn/tutorial/zh-CN/master/quick_start/quick_start.html)实现图片分类。
+
+
+## 文档
+
+有关安装指南、教程和API的更多详细信息，请参阅[用户文档](https://gitee.com/mindspore/docs)。
+
+## 社区
+
+### 治理
+
+查看MindSpore如何进行[开放治理](https://gitee.com/mindspore/community/blob/master/governance.md)。
+
+### 交流
+
+- [MindSpore Slack](https://join.slack.com/t/mindspore/shared_invite/zt-dgk65rli-3ex4xvS4wHX7UDmsQmfu8w) 开发者交流平台。
+- `#mindspore`IRC频道（仅用于会议记录）
+- 视频会议：待定
+- 邮件列表：<https://mailweb.mindspore.cn/postorius/lists>
+
+## 贡献
+
+欢迎参与贡献。更多详情，请参阅我们的[贡献者Wiki](CONTRIBUTING.md)。
+
+
+## 版本说明
+
+版本说明请参阅[RELEASE](RELEASE.md)。
+
+## 许可证
+
+[Apache License 2.0](LICENSE)
\ No newline at end of file
diff --git a/mindspore/ccsrc/pybind_api/ir/tensor_py.cc b/mindspore/ccsrc/pybind_api/ir/tensor_py.cc
index 6216ff02f1e..76453f8662f 100644
--- a/mindspore/ccsrc/pybind_api/ir/tensor_py.cc
+++ b/mindspore/ccsrc/pybind_api/ir/tensor_py.cc
@@ -150,7 +150,7 @@ TensorPtr TensorPy::MakeTensor(const py::array &input, const TypePtr &type_ptr)
   // Get tensor shape.
   std::vector<int> shape(buf.shape.begin(), buf.shape.end());
   if (data_type == buf_type) {
-    // Use memory copy if input data type is same as the required type.
+    // Use memory copy if input data type is the same as the required type.
     return std::make_shared<Tensor>(data_type, shape, buf.ptr, buf.size * buf.itemsize);
   }
   // Create tensor with data type converted.
diff --git a/mindspore/context.py b/mindspore/context.py
index eecdc291bfa..6240cbcadb1 100644
--- a/mindspore/context.py
+++ b/mindspore/context.py
@@ -546,9 +546,11 @@ def set_context(**kwargs):
 
     Note:
         Attribute name is required for setting attributes.
+        The mode is not recommended to be changed after net was initilized because the implementations of some
+        operations are different in graph mode and pynative mode. Default: PYNATIVE_MODE.
 
     Args:
-        mode (int): Running in GRAPH_MODE(0) or PYNATIVE_MODE(1). Default: PYNATIVE_MODE.
+        mode (int): Running in GRAPH_MODE(0) or PYNATIVE_MODE(1).
         device_target (str): The target device to run, support "Ascend", "GPU", "CPU". Default: "Ascend".
         device_id (int): Id of target device, the value must be in [0, device_num_per_host-1],
                     while device_num_per_host should no more than 4096. Default: 0.
diff --git a/mindspore/nn/cell.py b/mindspore/nn/cell.py
index bf7d1605db7..a3c464d74fa 100755
--- a/mindspore/nn/cell.py
+++ b/mindspore/nn/cell.py
@@ -148,7 +148,7 @@ class Cell:
 
     def update_cell_type(self, cell_type):
         """
-        Update the current cell type mainly identify if quantization aware training network.
+        The current cell type is updated when a quantization aware training network is encountered.
 
         After being invoked, it can set the cell type to 'cell_type'.
         """
@@ -936,7 +936,7 @@ class GraphKernel(Cell):
     Base class for GraphKernel.
 
     A `GraphKernel` a composite of basic primitives and can be compiled into a fused kernel automatically when
-    context.set_context(enable_graph_kernel=True).
+    enable_graph_kernel in context is set to True.
 
     Examples:
         >>> class Relu(GraphKernel):
diff --git a/mindspore/nn/graph_kernels/graph_kernels.py b/mindspore/nn/graph_kernels/graph_kernels.py
index 21a4c38ac5b..6a43af2734d 100644
--- a/mindspore/nn/graph_kernels/graph_kernels.py
+++ b/mindspore/nn/graph_kernels/graph_kernels.py
@@ -661,7 +661,7 @@ class LogSoftmax(GraphKernel):
     Log Softmax activation function.
 
     Applies the Log Softmax function to the input tensor on the specified axis.
-    Suppose a slice along the given aixs :math:`x` then for each element :math:`x_i`
+    Suppose a slice in the given aixs :math:`x` then for each element :math:`x_i`
     the Log Softmax function is shown as follows:
 
     .. math::
@@ -987,10 +987,10 @@ class LayerNorm(Cell):
     Applies Layer Normalization over a mini-batch of inputs.
 
     Layer normalization is widely used in recurrent neural networks. It applies
-    normalization over a mini-batch of inputs for each single training case as described
+    normalization on a mini-batch of inputs for each single training case as described
     in the paper `Layer Normalization <https://arxiv.org/pdf/1607.06450.pdf>`_. Unlike batch
     normalization, layer normalization performs exactly the same computation at training and
-    testing times. It can be described using the following formula. It is applied across all channels
+    testing time. It can be described using the following formula. It is applied across all channels
     and pixel but only one batch size.
 
     .. math::
@@ -1139,9 +1139,9 @@ class LambNextMV(GraphKernel):
     Outputs:
         Tuple of 2 Tensor.
 
-        - **add3** (Tensor) - The shape is same as the shape after broadcasting, and the data type is
+        - **add3** (Tensor) - The shape is the same as the shape after broadcasting, and the data type is
                               the one with high precision or high digits among the inputs.
-        - **realdiv4** (Tensor) - The shape is same as the shape after broadcasting, and the data type is
+        - **realdiv4** (Tensor) - The shape is the same as the shape after broadcasting, and the data type is
                                   the one with high precision or high digits among the inputs.
 
     Examples:
diff --git a/mindspore/nn/layer/activation.py b/mindspore/nn/layer/activation.py
index 8135e61a3bd..11205b064ca 100644
--- a/mindspore/nn/layer/activation.py
+++ b/mindspore/nn/layer/activation.py
@@ -55,7 +55,7 @@ class Softmax(Cell):
     .. math::
         \text{softmax}(x_{i}) =  \frac{\exp(x_i)}{\sum_{j=0}^{n-1}\exp(x_j)},
 
-    where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
+    where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
 
     Args:
         axis (Union[int, tuple[int]]): The axis to apply Softmax operation, -1 means the last dimension. Default: -1.
@@ -87,11 +87,11 @@ class LogSoftmax(Cell):
 
     Applies the LogSoftmax function to n-dimensional input tensor.
 
-    The input is transformed with Softmax function and then with log function to lie in range[-inf,0).
+    The input is transformed by the Softmax function and then by the log function to lie in range[-inf,0).
 
     Logsoftmax is defined as:
     :math:`\text{logsoftmax}(x_i) = \log \left(\frac{\exp(x_i)}{\sum_{j=0}^{n-1} \exp(x_j)}\right)`,
-    where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
+    where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
 
     Args:
         axis (int): The axis to apply LogSoftmax operation, -1 means the last dimension. Default: -1.
@@ -123,7 +123,7 @@ class ELU(Cell):
     Exponential Linear Uint activation function.
 
     Applies the exponential linear unit function element-wise.
-    The activation function defined as:
+    The activation function is defined as:
 
     .. math::
         E_{i} =
@@ -162,7 +162,7 @@ class ReLU(Cell):
 
     Applies the rectified linear unit function element-wise. It returns
     element-wise :math:`\max(0, x)`, specially, the neurons with the negative output
-    will suppressed and the active neurons will stay the same.
+    will be suppressed and the active neurons will stay the same.
 
     Inputs:
         - **input_data** (Tensor) - The input of ReLU.
@@ -197,7 +197,7 @@ class ReLU6(Cell):
         - **input_data** (Tensor) - The input of ReLU6.
 
     Outputs:
-        Tensor, which has the same type with `input_data`.
+        Tensor, which has the same type as `input_data`.
 
     Examples:
         >>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
@@ -234,7 +234,7 @@ class LeakyReLU(Cell):
         - **input_x** (Tensor) - The input of LeakyReLU.
 
     Outputs:
-        Tensor, has the same type and shape with the `input_x`.
+        Tensor, has the same type and shape as the `input_x`.
 
     Examples:
         >>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
@@ -365,7 +365,7 @@ class PReLU(Cell):
     PReLU is defined as: :math:`prelu(x_i)= \max(0, x_i) + w * \min(0, x_i)`, where :math:`x_i`
     is an element of an channel of the input.
 
-    Here :math:`w` is an learnable parameter with default initial value 0.25.
+    Here :math:`w` is a learnable parameter with a default initial value 0.25.
     Parameter :math:`w` has dimensionality of the argument channel. If called without argument
     channel, a single parameter :math:`w` will be shared across all channels.
 
@@ -413,7 +413,7 @@ class PReLU(Cell):
 
 class HSwish(Cell):
     r"""
-    rHard swish activation function.
+    Hard swish activation function.
 
     Applies hswish-type activation element-wise. The input is a Tensor with any valid shape.
 
@@ -422,7 +422,7 @@ class HSwish(Cell):
     .. math::
         \text{hswish}(x_{i}) = x_{i} * \frac{ReLU6(x_{i} + 3)}{6},
 
-    where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
+    where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
 
     Inputs:
         - **input_data** (Tensor) - The input of HSwish.
@@ -456,7 +456,7 @@ class HSigmoid(Cell):
     .. math::
         \text{hsigmoid}(x_{i}) = max(0, min(1, \frac{x_{i} + 3}{6})),
 
-    where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
+    where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
 
     Inputs:
         - **input_data** (Tensor) - The input of HSigmoid.
diff --git a/mindspore/nn/layer/basic.py b/mindspore/nn/layer/basic.py
index a822cf75678..d688c6a2bdf 100644
--- a/mindspore/nn/layer/basic.py
+++ b/mindspore/nn/layer/basic.py
@@ -65,7 +65,7 @@ class Dropout(Cell):
         dtype (:class:`mindspore.dtype`): Data type of input. Default: mindspore.float32.
 
     Raises:
-        ValueError: If keep_prob is not in range (0, 1).
+        ValueError: If `keep_prob` is not in range (0, 1).
 
     Inputs:
         - **input** (Tensor) - An N-D Tensor.
@@ -373,8 +373,8 @@ class OneHot(Cell):
         axis is created at dimension `axis`.
 
     Args:
-        axis (int): Features x depth if axis == -1, depth x features
-                    if axis == 0. Default: -1.
+        axis (int): Features x depth if axis is -1, depth x features
+                    if axis is 0. Default: -1.
         depth (int): A scalar defining the depth of the one hot dimension. Default: 1.
         on_value (float): A scalar defining the value to fill in output[i][j]
                           when indices[j] = i. Default: 1.0.
@@ -492,18 +492,18 @@ class Unfold(Cell):
     The input tensor must be a 4-D tensor and the data format is NCHW.
 
     Args:
-        ksizes (Union[tuple[int], list[int]]): The size of sliding window, should be a tuple or list of int,
+        ksizes (Union[tuple[int], list[int]]): The size of sliding window, should be a tuple or a list of integers,
             and the format is [1, ksize_row, ksize_col, 1].
         strides (Union[tuple[int], list[int]]): Distance between the centers of the two consecutive patches,
             should be a tuple or list of int, and the format is [1, stride_row, stride_col, 1].
-        rates (Union[tuple[int], list[int]]): In each extracted patch, the gap between the corresponding dim
-            pixel positions, should be a tuple or list of int, and the format is [1, rate_row, rate_col, 1].
+        rates (Union[tuple[int], list[int]]): In each extracted patch, the gap between the corresponding dimension
+            pixel positions, should be a tuple or a list of integers, and the format is [1, rate_row, rate_col, 1].
         padding (str): The type of padding algorithm, is a string whose value is "same" or "valid",
             not case sensitive. Default: "valid".
 
             - same: Means that the patch can take the part beyond the original image, and this part is filled with 0.
 
-            - valid: Means that the patch area taken must be completely contained in the original image.
+            - valid: Means that the taken patch area must be completely covered in the original image.
 
     Inputs:
         - **input_x** (Tensor) - A 4-D tensor whose shape is [in_batch, in_depth, in_row, in_col] and
@@ -511,7 +511,7 @@ class Unfold(Cell):
 
     Outputs:
         Tensor, a 4-D tensor whose data type is same as 'input_x',
-        and the shape is [out_batch, out_depth, out_row, out_col], the out_batch is same as the in_batch.
+        and the shape is [out_batch, out_depth, out_row, out_col], the out_batch is the same as the in_batch.
 
     Examples:
         >>> net = Unfold(ksizes=[1, 2, 2, 1], strides=[1, 1, 1, 1], rates=[1, 1, 1, 1])
@@ -556,11 +556,11 @@ class MatrixDiag(Cell):
     Returns a batched diagonal tensor with a given batched diagonal values.
 
     Inputs:
-        - **x** (Tensor) - The diagonal values. It can be of the following data types:
-          float32, float16, int32, int8, uint8.
+        - **x** (Tensor) - The diagonal values. It can be one of the following data types:
+          float32, float16, int32, int8, and uint8.
 
     Outputs:
-        Tensor, same type as input `x`. The shape should be x.shape + (x.shape[-1], ).
+        Tensor, has the same type as input `x`. The shape should be x.shape + (x.shape[-1], ).
 
     Examples:
         >>> x = Tensor(np.array([1, -1]), mstype.float32)
@@ -587,11 +587,11 @@ class MatrixDiagPart(Cell):
     Returns the batched diagonal part of a batched tensor.
 
     Inputs:
-        - **x** (Tensor) - The batched tensor. It can be of the following data types:
-          float32, float16, int32, int8, uint8.
+        - **x** (Tensor) - The batched tensor. It can be one of the following data types:
+          float32, float16, int32, int8, and uint8.
 
     Outputs:
-        Tensor, same type as input `x`. The shape should be x.shape[:-2] + [min(x.shape[-2:])].
+        Tensor, has the same type as input `x`. The shape should be x.shape[:-2] + [min(x.shape[-2:])].
 
     Examples:
         >>> x = Tensor([[[-1, 0], [0, 1]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]], mindspore.float32)
@@ -617,12 +617,12 @@ class MatrixSetDiag(Cell):
     Modify the batched diagonal part of a batched tensor.
 
     Inputs:
-        - **x** (Tensor) - The batched tensor. It can be of the following data types:
-          float32, float16, int32, int8, uint8.
+        - **x** (Tensor) - The batched tensor. It can be one of the following data types:
+          float32, float16, int32, int8, and uint8.
         - **diagonal** (Tensor) - The diagonal values.
 
     Outputs:
-        Tensor, same type as input `x`. The shape same as `x`.
+        Tensor, has the same type and shape as input `x`.
 
     Examples:
         >>> x = Tensor([[[-1, 0], [0, 1]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]], mindspore.float32)
diff --git a/mindspore/nn/layer/container.py b/mindspore/nn/layer/container.py
index c881f417446..843a0784d54 100644
--- a/mindspore/nn/layer/container.py
+++ b/mindspore/nn/layer/container.py
@@ -72,7 +72,7 @@ class SequentialCell(Cell):
         args (list, OrderedDict): List of subclass of Cell.
 
     Raises:
-        TypeError: If arg is not of type list or OrderedDict.
+        TypeError: If the type of the argument is not list or OrderedDict.
 
     Inputs:
         - **input** (Tensor) - Tensor with shape according to the first Cell in the sequence.
diff --git a/mindspore/nn/layer/conv.py b/mindspore/nn/layer/conv.py
index 185271fb531..9a1d132447a 100644
--- a/mindspore/nn/layer/conv.py
+++ b/mindspore/nn/layer/conv.py
@@ -131,7 +131,7 @@ class Conv2d(_Conv):
     Args:
         in_channels (int): The number of input channel :math:`C_{in}`.
         out_channels (int): The number of output channel :math:`C_{out}`.
-        kernel_size (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the height
+        kernel_size (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the height
             and width of the 2D convolution window. Single int means the value is for both the height and the width of
             the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
             width of the kernel.
@@ -147,7 +147,7 @@ class Conv2d(_Conv):
               last extra padding will be done from the bottom and the right side. If this mode is set, `padding`
               must be 0.
 
-            - valid: Adopts the way of discarding. The possibly largest height and width of output will be returned
+            - valid: Adopts the way of discarding. The possible largest height and width of output will be returned
               without padding. Extra pixels will be discarded. If this mode is set, `padding`
               must be 0.
 
@@ -158,7 +158,7 @@ class Conv2d(_Conv):
                     the padding of top, bottom, left and right is the same, equal to padding. If `padding` is a tuple
                     with four integers, the padding of top, bottom, left and right will be equal to padding[0],
                     padding[1], padding[2], and padding[3] accordingly. Default: 0.
-        dilation (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the dilation rate
+        dilation (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the dilation rate
                                       to use for dilated convolution. If set to be :math:`k > 1`, there will
                                       be :math:`k - 1` pixels skipped for each sampling location. Its value should
                                       be greater or equal to 1 and bounded by the height and width of the
@@ -451,7 +451,7 @@ class Conv2dTranspose(_Conv):
     Args:
         in_channels (int): The number of channels in the input space.
         out_channels (int): The number of channels in the output space.
-        kernel_size (Union[int, tuple]): int or tuple with 2 integers, which specifies the  height
+        kernel_size (Union[int, tuple]): int or a tuple of 2 integers, which specifies the  height
             and width of the 2D convolution window. Single int means the value is for both the height and the width of
             the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
             width of the kernel.
@@ -825,7 +825,7 @@ class DepthwiseConv2d(Cell):
     Args:
         in_channels (int): The number of input channel :math:`C_{in}`.
         out_channels (int): The number of output channel :math:`C_{out}`.
-        kernel_size (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the height
+        kernel_size (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the height
             and width of the 2D convolution window. Single int means the value is for both the height and the width of
             the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
             width of the kernel.
@@ -841,7 +841,7 @@ class DepthwiseConv2d(Cell):
               last extra padding will be done from the bottom and the right side. If this mode is set, `padding`
               must be 0.
 
-            - valid: Adopts the way of discarding. The possibly largest height and width of output will be returned
+            - valid: Adopts the way of discarding. The possible largest height and width of output will be returned
               without padding. Extra pixels will be discarded. If this mode is set, `padding`
               must be 0.
 
@@ -849,16 +849,16 @@ class DepthwiseConv2d(Cell):
               Tensor borders. `padding` should be greater than or equal to 0.
 
         padding (int): Implicit paddings on both sides of the input. Default: 0.
-        dilation (Union[int, tuple[int]]): The data type is int or tuple with 2 integers. Specifies the dilation rate
+        dilation (Union[int, tuple[int]]): The data type is int or a tuple of 2 integers. Specifies the dilation rate
                                       to use for dilated convolution. If set to be :math:`k > 1`, there will
                                       be :math:`k - 1` pixels skipped for each sampling location. Its value should
-                                      be greater or equal to 1 and bounded by the height and width of the
+                                      be greater than or equal to 1 and bounded by the height and width of the
                                       input. Default: 1.
         group (int): Split filter into groups, `in_ channels` and `out_channels` should be
             divisible by the number of groups. Default: 1.
         has_bias (bool): Specifies whether the layer uses a bias vector. Default: False.
         weight_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the convolution kernel.
-            It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified,
+            It can be a Tensor, a string, an Initializer or a number. When a string is specified,
             values from 'TruncatedNormal', 'Normal', 'Uniform', 'HeUniform' and 'XavierUniform' distributions as well
             as constant 'One' and 'Zero' distributions are possible. Alias 'xavier_uniform', 'he_uniform', 'ones'
             and 'zeros' are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of
diff --git a/mindspore/nn/layer/embedding.py b/mindspore/nn/layer/embedding.py
index 9a72a0a1739..83241239f3b 100755
--- a/mindspore/nn/layer/embedding.py
+++ b/mindspore/nn/layer/embedding.py
@@ -36,7 +36,7 @@ class Embedding(Cell):
     the corresponding word embeddings.
 
     Note:
-        When 'use_one_hot' is set to True, the input should be of type mindspore.int32.
+        When 'use_one_hot' is set to True, the type of the input should be mindspore.int32.
 
     Args:
         vocab_size (int): Size of the dictionary of embeddings.
@@ -48,9 +48,9 @@ class Embedding(Cell):
         dtype (:class:`mindspore.dtype`): Data type of input. Default: mindspore.float32.
 
     Inputs:
-        - **input** (Tensor) - Tensor of shape :math:`(\text{batch_size}, \text{input_length})`. The element of
-          the Tensor should be integer and not larger than vocab_size. else the corresponding embedding vector is zero
-          if larger than vocab_size.
+        - **input** (Tensor) - Tensor of shape :math:`(\text{batch_size}, \text{input_length})`. The elements of
+          the Tensor should be integer and not larger than vocab_size. Otherwise the corresponding embedding vector will
+          be zero.
 
     Outputs:
         Tensor of shape :math:`(\text{batch_size}, \text{input_length}, \text{embedding_size})`.
diff --git a/mindspore/nn/layer/image.py b/mindspore/nn/layer/image.py
index 88ab386c1ae..fce4250bc5c 100644
--- a/mindspore/nn/layer/image.py
+++ b/mindspore/nn/layer/image.py
@@ -253,7 +253,7 @@ class MSSSIM(Cell):
     Args:
         max_val (Union[int, float]): The dynamic range of the pixel values (255 for 8-bit grayscale images).
           Default: 1.0.
-        power_factors (Union[tuple, list]): Iterable of weights for each of the scales.
+        power_factors (Union[tuple, list]): Iterable of weights for each scal e.
           Default: (0.0448, 0.2856, 0.3001, 0.2363, 0.1333). Default values obtained by Wang et al.
         filter_size (int): The size of the Gaussian filter. Default: 11.
         filter_sigma (float): The standard deviation of Gaussian kernel. Default: 1.5.
diff --git a/mindspore/nn/layer/lstm.py b/mindspore/nn/layer/lstm.py
index c640f89557c..7987e42a518 100755
--- a/mindspore/nn/layer/lstm.py
+++ b/mindspore/nn/layer/lstm.py
@@ -35,7 +35,7 @@ class LSTM(Cell):
     Applies a LSTM to the input.
 
     There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline
-    and another is hidden state pipeline. Denote two consecutive time nodes as :math:`t-1` and :math:`t`.
+    and the other is hidden state pipeline. Denote two consecutive time nodes as :math:`t-1` and :math:`t`.
     Given an input :math:`x_t` at time :math:`t`, an hidden state :math:`h_{t-1}` and an cell
     state :math:`c_{t-1}` of the layer at time :math:`{t-1}`, the cell state and hidden state at
     time :math:`t` is computed using an gating mechanism. Input gate :math:`i_t` is designed to protect the cell
@@ -68,18 +68,17 @@ class LSTM(Cell):
         input_size (int): Number of features of input.
         hidden_size (int):  Number of features of hidden layer.
         num_layers (int): Number of layers of stacked LSTM . Default: 1.
-        has_bias (bool): Specifies whether has bias `b_ih` and `b_hh`. Default: True.
+        has_bias (bool): Whether the cell has bias `b_ih` and `b_hh`. Default: True.
         batch_first (bool): Specifies whether the first dimension of input is batch_size. Default: False.
         dropout (float, int): If not 0, append `Dropout` layer on the outputs of each
             LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].
-        bidirectional (bool): Specifies whether this is a bidirectional LSTM. If set True,
-            number of directions will be 2 otherwise number of directions is 1. Default: False.
+        bidirectional (bool): Specifies whether it is a bidirectional LSTM. Default: False.
 
     Inputs:
         - **input** (Tensor) - Tensor of shape (seq_len, batch_size, `input_size`).
         - **hx** (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or
           mindspore.float16 and shape (num_directions * `num_layers`, batch_size, `hidden_size`).
-          Data type of `hx` should be the same of `input`.
+          Data type of `hx` should be the same as `input`.
 
     Outputs:
         Tuple, a tuple constains (`output`, (`h_n`, `c_n`)).
@@ -205,7 +204,7 @@ class LSTMCell(Cell):
     Applies a LSTM layer to the input.
 
     There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline
-    and another is hidden state pipeline. Denote two consecutive time nodes as :math:`t-1` and :math:`t`.
+    and the other is hidden state pipeline. Denote two consecutive time nodes as :math:`t-1` and :math:`t`.
     Given an input :math:`x_t` at time :math:`t`, an hidden state :math:`h_{t-1}` and an cell
     state :math:`c_{t-1}` of the layer at time :math:`{t-1}`, the cell state and hidden state at
     time :math:`t` is computed using an gating mechanism. Input gate :math:`i_t` is designed to protect the cell
@@ -238,7 +237,7 @@ class LSTMCell(Cell):
         input_size (int): Number of features of input.
         hidden_size (int):  Number of features of hidden layer.
         layer_index (int): index of current layer of stacked LSTM . Default: 0.
-        has_bias (bool): Specifies whether has bias `b_ih` and `b_hh`. Default: True.
+        has_bias (bool): Whether the cell has bias `b_ih` and `b_hh`. Default: True.
         batch_first (bool): Specifies whether the first dimension of input is batch_size. Default: False.
         dropout (float, int): If not 0, append `Dropout` layer on the outputs of each
             LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].
diff --git a/mindspore/nn/layer/normalization.py b/mindspore/nn/layer/normalization.py
index b1296732100..843e09b2ca3 100644
--- a/mindspore/nn/layer/normalization.py
+++ b/mindspore/nn/layer/normalization.py
@@ -243,6 +243,10 @@ class BatchNorm1d(_BatchNorm):
     .. math::
         y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
 
+    Note:
+        The implementation of BatchNorm is different in graph mode and pynative mode, therefore the mode is not
+        recommended to be changed after net was initilized.
+
     Args:
         num_features (int): `C` from an expected input of size (N, C).
         eps (float): A value added to the denominator for numerical stability. Default: 1e-5.
@@ -319,6 +323,10 @@ class BatchNorm2d(_BatchNorm):
     .. math::
         y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta
 
+    Note:
+        The implementation of BatchNorm is different in graph mode and pynative mode, therefore that mode can not be
+        changed after net was initilized.
+
     Args:
         num_features (int): `C` from an expected input of size (N, C, H, W).
         eps (float): A value added to the denominator for numerical stability. Default: 1e-5.
@@ -384,8 +392,8 @@ class GlobalBatchNorm(_BatchNorm):
     r"""
     Global normalization layer over a N-dimension input.
 
-    Global Normalization is cross device synchronized batch normalization. Batch Normalization implementation
-    only normalize the data within each device. Global normalization will normalize the input within the group.
+    Global Normalization is cross device synchronized batch normalization. The implementation of Batch Normalization
+    only normalizes the data within each device. Global normalization will normalize the input within the group.
     It has been described in the paper `Batch Normalization: Accelerating Deep Network Training by
     Reducing Internal Covariate Shift <https://arxiv.org/abs/1502.03167>`_. It rescales and recenters the
     feature using a mini-batch of data and the learned parameters which can be described in the following formula.
@@ -467,10 +475,10 @@ class LayerNorm(Cell):
     Applies Layer Normalization over a mini-batch of inputs.
 
     Layer normalization is widely used in recurrent neural networks. It applies
-    normalization over a mini-batch of inputs for each single training case as described
+    normalization on a mini-batch of inputs for each single training case as described
     in the paper `Layer Normalization <https://arxiv.org/pdf/1607.06450.pdf>`_. Unlike batch
     normalization, layer normalization performs exactly the same computation at training and
-    testing times. It can be described using the following formula. It is applied across all channels
+    testing time. It can be described using the following formula. It is applied across all channels
     and pixel but only one batch size.
 
     .. math::
@@ -545,7 +553,7 @@ class GroupNorm(Cell):
     Group Normalization over a mini-batch of inputs.
 
     Group normalization is widely used in recurrent neural networks. It applies
-    normalization over a mini-batch of inputs for each single training case as described
+    normalization on a mini-batch of inputs for each single training case as described
     in the paper `Group Normalization <https://arxiv.org/pdf/1803.08494.pdf>`_. Group normalization
     divides the channels into groups and computes within each group the mean and variance for normalization,
     and it performs very stable over a wide range of batch size. It can be described using the following formula.
@@ -557,7 +565,7 @@ class GroupNorm(Cell):
         num_groups (int): The number of groups to be divided along the channel dimension.
         num_channels (int): The number of channels per group.
         eps (float): A value added to the denominator for numerical stability. Default: 1e-5.
-        affine (bool): A bool value, this layer will has learnable affine parameters when set to true. Default: True.
+        affine (bool): A bool value, this layer will have learnable affine parameters when set to true. Default: True.
         gamma_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the gamma weight.
             The values of str refer to the function `initializer` including 'zeros', 'ones', 'xavier_uniform',
             'he_uniform', etc. Default: 'ones'.
diff --git a/mindspore/nn/layer/quant.py b/mindspore/nn/layer/quant.py
index e809f21b72a..2aa075b1212 100644
--- a/mindspore/nn/layer/quant.py
+++ b/mindspore/nn/layer/quant.py
@@ -61,7 +61,7 @@ class Conv2dBnAct(Cell):
     Args:
         in_channels (int): The number of input channel :math:`C_{in}`.
         out_channels (int): The number of output channel :math:`C_{out}`.
-        kernel_size (Union[int, tuple]): The data type is int or tuple with 2 integers. Specifies the height
+        kernel_size (Union[int, tuple]): The data type is int or a tuple of 2 integers. Specifies the height
             and width of the 2D convolution window. Single int means the value is for both height and width of
             the kernel. A tuple of 2 ints means the first value is for the height and the other is for the
             width of the kernel.
@@ -292,19 +292,19 @@ class BatchNormFoldCell(Cell):
 
 class FakeQuantWithMinMax(Cell):
     r"""
-    Quantization aware op. This OP provide Fake quantization observer function on data with min and max.
+    Quantization aware op. This OP provides the fake quantization observer function on data with min and max.
 
     Args:
         min_init (int, float): The dimension of channel or 1(layer). Default: -6.
         max_init (int, float): The dimension of channel or 1(layer). Default: 6.
-        ema (bool): Exponential Moving Average algorithm update min and max. Default: False.
+        ema (bool): The exponential Moving Average algorithm updates min and max. Default: False.
         ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
         per_channel (bool):  Quantization granularity based on layer or on channel. Default: False.
         channel_axis (int): Quantization by channel axis. Default: 1.
         num_channels (int): declarate the min and max channel size, Default: 1.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
-        symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
-        narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
+        symmetric (bool): Whether the quantization algorithm is symmetric or not. Default: False.
+        narrow_range (bool): Whether the quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
 
     Inputs:
@@ -431,7 +431,7 @@ class Conv2dBnFoldQuant(Cell):
             variance vector. Default: 'ones'.
         fake (bool): Whether Conv2dBnFoldQuant Cell adds FakeQuantWithMinMax op. Default: True.
         per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
         symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
         narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): The Quantization delay parameters according to the global step. Default: 0.
@@ -614,7 +614,7 @@ class Conv2dBnWithoutFoldQuant(Cell):
             Default: 'normal'.
         bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Default: 'zeros'.
         per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
         symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
         narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@@ -736,7 +736,7 @@ class Conv2dQuant(Cell):
             Default: 'normal'.
         bias_init (Union[Tensor, str, Initializer, numbers.Number]): Initializer for the bias vector. Default: 'zeros'.
         per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
         symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
         narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@@ -845,7 +845,7 @@ class DenseQuant(Cell):
         has_bias (bool): Specifies whether the layer uses a bias vector. Default: True.
         activation (str): The regularization function applied to the output of the layer, eg. 'relu'. Default: None.
         per_channel (bool): FakeQuantWithMinMax Parameters. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
         symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
         narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@@ -947,15 +947,14 @@ class ActQuant(_QuantActivation):
     r"""
     Quantization aware training activation function.
 
-    Add Fake Quant OP after activation. Not Recommand to used these cell for Fake Quant Op
-    Will climp the max range of the activation and the relu6 do the same operation.
-    This part is a more detailed overview of ReLU6 op.
+    Add the fake quant op to the end of activation op, by which the output of activation op will be truncated.
+    Please check `FakeQuantWithMinMax` for more details.
 
     Args:
         activation (Cell): Activation cell class.
         ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
         per_channel (bool):  Quantization granularity based on layer or on channel. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
         symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
         narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global steps. Default: 0.
@@ -1010,7 +1009,7 @@ class LeakyReLUQuant(_QuantActivation):
         activation (Cell): Activation cell class.
         ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
         per_channel (bool):  Quantization granularity based on layer or on channel. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
         symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
         narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@@ -1080,9 +1079,9 @@ class HSwishQuant(_QuantActivation):
         activation (Cell): Activation cell class.
         ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
         per_channel (bool):  Quantization granularity based on layer or on channel. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
-        symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
-        narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
+        symmetric (bool): Whether the quantization algorithm is symmetric or not. Default: False.
+        narrow_range (bool): Whether the quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
 
     Inputs:
@@ -1149,9 +1148,9 @@ class HSigmoidQuant(_QuantActivation):
         activation (Cell): Activation cell class.
         ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
         per_channel (bool):  Quantization granularity based on layer or on channel. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
-        symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
-        narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
+        symmetric (bool): Whether the quantization algorithm is symmetric or not. Default: False.
+        narrow_range (bool): Whether the quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
 
     Inputs:
@@ -1217,7 +1216,7 @@ class TensorAddQuant(Cell):
     Args:
         ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
         per_channel (bool):  Quantization granularity based on layer or on channel. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
         symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
         narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
@@ -1269,7 +1268,7 @@ class MulQuant(Cell):
     Args:
         ema_decay (float): Exponential Moving Average algorithm parameter. Default: 0.999.
         per_channel (bool):  Quantization granularity based on layer or on channel. Default: False.
-        num_bits (int): The quantization number bit, support 4 and 8bit. Default: 8.
+        num_bits (int): The bit number of quantization, supporting 4 and 8bits. Default: 8.
         symmetric (bool): The quantization algorithm is symmetric or not. Default: False.
         narrow_range (bool): The quantization algorithm uses narrow range or not. Default: False.
         quant_delay (int): Quantization delay parameters according to the global step. Default: 0.
diff --git a/mindspore/nn/loss/loss.py b/mindspore/nn/loss/loss.py
index f1599754cb3..ca13ce40c98 100644
--- a/mindspore/nn/loss/loss.py
+++ b/mindspore/nn/loss/loss.py
@@ -80,7 +80,7 @@ class L1Loss(_Loss):
     When argument reduction is 'sum', the sum of :math:`L(x, y)` will be returned. :math:`N` is the batch size.
 
     Args:
-        reduction (str): Type of reduction to apply to loss. The optional values are "mean", "sum", "none".
+        reduction (str): Type of reduction to be applied to loss. The optional values are "mean", "sum", and "none".
             Default: "mean".
 
     Inputs:
@@ -107,7 +107,7 @@ class L1Loss(_Loss):
 
 class MSELoss(_Loss):
     r"""
-    MSELoss create a criterion to measures the mean squared error (squared L2-norm) between :math:`x` and :math:`y`
+    MSELoss creates a criterion to measure the mean squared error (squared L2-norm) between :math:`x` and :math:`y`
     by element, where :math:`x` is the input and :math:`y` is the target.
 
     For simplicity, let :math:`x` and :math:`y` be 1-dimensional Tensor with length :math:`N`,
@@ -120,7 +120,7 @@ class MSELoss(_Loss):
     When argument reduction is 'sum', the sum of :math:`L(x, y)` will be returned. :math:`N` is the batch size.
 
     Args:
-        reduction (str): Type of reduction to apply to loss. The optional values are "mean", "sum", "none".
+        reduction (str): Type of reduction to be applied to loss. The optional values are "mean", "sum", and "none".
             Default: "mean".
 
     Inputs:
@@ -210,14 +210,14 @@ class SoftmaxCrossEntropyWithLogits(_Loss):
 
     Note:
         While the target classes are mutually exclusive, i.e., only one class is positive in the target, the predicted
-        probabilities need not be exclusive. All that is required is that the predicted probability distribution
+        probabilities need not to be exclusive. It is only required that the predicted probability distribution
         of entry is a valid one.
 
     Args:
         is_grad (bool): Specifies whether calculate grad only. Default: True.
         sparse (bool): Specifies whether labels use sparse format or not. Default: False.
-        reduction (Union[str, None]): Type of reduction to apply to loss. Support 'sum' or 'mean' If None,
-            do not reduction. Default: None.
+        reduction (Union[str, None]): Type of reduction to be applied to loss. Support 'sum' and 'mean'. If None,
+            do not perform reduction. Default: None.
         smooth_factor (float): Label smoothing factor. It is a optional input which should be in range [0, 1].
             Default: 0.
         num_classes (int): The number of classes in the task. It is a optional input Default: 2.
@@ -225,7 +225,7 @@ class SoftmaxCrossEntropyWithLogits(_Loss):
     Inputs:
         - **logits** (Tensor) - Tensor of shape (N, C).
         - **labels** (Tensor) - Tensor of shape (N, ). If `sparse` is True, The type of
-          `labels` is mindspore.int32. If `sparse` is False, the type of `labels` is same as the type of `logits`.
+          `labels` is mindspore.int32. If `sparse` is False, the type of `labels` is the same as the type of `logits`.
 
     Outputs:
         Tensor, a tensor of the same shape as logits with the component-wise
@@ -282,8 +282,8 @@ class SoftmaxCrossEntropyExpand(Cell):
     where :math:`x_i` is a 1D score Tensor, :math:`t_i` is the target class.
 
     Note:
-        When argument sparse is set to True, the format of label is the index
-        range from :math:`0` to :math:`C - 1` instead of one-hot vectors.
+        When argument sparse is set to True, the format of the label is the index
+        ranging from :math:`0` to :math:`C - 1` instead of one-hot vectors.
 
     Args:
         sparse(bool): Specifies whether labels use sparse format or not. Default: False.
diff --git a/mindspore/nn/metrics/__init__.py b/mindspore/nn/metrics/__init__.py
index b82673b2ba7..aba31299f8d 100755
--- a/mindspore/nn/metrics/__init__.py
+++ b/mindspore/nn/metrics/__init__.py
@@ -69,7 +69,7 @@ def names():
 
 def get_metric_fn(name, *args, **kwargs):
     """
-    Gets the metric method base on the input name.
+    Gets the metric method based on the input name.
 
     Args:
         name (str): The name of metric method. Refer to the '__factory__'
diff --git a/mindspore/nn/metrics/metric.py b/mindspore/nn/metrics/metric.py
index d8e38796533..673403b29e1 100644
--- a/mindspore/nn/metrics/metric.py
+++ b/mindspore/nn/metrics/metric.py
@@ -82,7 +82,7 @@ class Metric(metaclass=ABCMeta):
     @abstractmethod
     def clear(self):
         """
-        A interface describes the behavior of clearing the internal evaluation result.
+        An interface describes the behavior of clearing the internal evaluation result.
 
         Note:
             All subclasses should override this interface.
@@ -92,7 +92,7 @@ class Metric(metaclass=ABCMeta):
     @abstractmethod
     def eval(self):
         """
-        A interface describes the behavior of computing the evaluation result.
+        An interface describes the behavior of computing the evaluation result.
 
         Note:
             All subclasses should override this interface.
@@ -102,7 +102,7 @@ class Metric(metaclass=ABCMeta):
     @abstractmethod
     def update(self, *inputs):
         """
-        A interface describes the behavior of updating the internal evaluation result.
+        An interface describes the behavior of updating the internal evaluation result.
 
         Note:
             All subclasses should override this interface.
diff --git a/mindspore/nn/optim/adam.py b/mindspore/nn/optim/adam.py
index 400c4fc9c33..7fd1b0270ae 100755
--- a/mindspore/nn/optim/adam.py
+++ b/mindspore/nn/optim/adam.py
@@ -36,8 +36,8 @@ def _update_run_op(beta1, beta2, eps, lr, weight_decay, param, m, v, gradient, d
     Update parameters.
 
     Args:
-        beta1 (Tensor): The exponential decay rate for the 1st moment estimates. Should be in range (0.0, 1.0).
-        beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
+        beta1 (Tensor): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
+        beta2 (Tensor): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
         eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
         lr (Tensor): Learning rate.
         weight_decay (Number): Weight decay. Should be equal to or greater than 0.
@@ -180,12 +180,12 @@ class Adam(Optimizer):
               the order will be followed in the optimizer. There are no other keys in the `dict` and the parameters
               which in the 'order_params' should be in one of group parameters.
 
-        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
-            When the learning_rate is a Iterable or a Tensor with dimension of 1, use the dynamic learning rate, then
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use the dynamic learning rate, then
             the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
             use dynamic learning rate, the i-th learning rate will be calculated during the process of training
-            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
-            dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
             equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
             Default: 1e-3.
         beta1 (float): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
@@ -195,11 +195,11 @@ class Adam(Optimizer):
         eps (float): Term added to the denominator to improve numerical stability. Should be greater than 0. Default:
                      1e-8.
         use_locking (bool): Whether to enable a lock to protect updating variable tensors.
-            If True, updating of the var, m, and v tensors will be protected by a lock.
-            If False, the result is unpredictable. Default: False.
+            If true, updates of the var, m, and v tensors will be protected by a lock.
+            If false, the result is unpredictable. Default: False.
         use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.
-            If True, update the gradients using NAG.
-            If False, update the gradients without using NAG. Default: False.
+            If true, update the gradients using NAG.
+            If false, update the gradients without using NAG. Default: False.
         weight_decay (float): Weight decay (L2 penalty). It should be equal to or greater than 0. Default: 0.0.
         loss_scale (float): A floating point value for the loss scale. Should be greater than 0. Default: 1.0.
 
@@ -304,12 +304,12 @@ class AdamWeightDecay(Optimizer):
               the order will be followed in the optimizer. There are no other keys in the `dict` and the parameters
               which in the 'order_params' should be in one of group parameters.
 
-        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
-            When the learning_rate is a Iterable or a Tensor with dimension of 1, use the dynamic learning rate, then
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use the dynamic learning rate, then
             the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
             use dynamic learning rate, the i-th learning rate will be calculated during the process of training
-            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
-            dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
             equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
             Default: 1e-3.
         beta1 (float): The exponential decay rate for the 1st moment estimations. Default: 0.9.
diff --git a/mindspore/nn/optim/ftrl.py b/mindspore/nn/optim/ftrl.py
index d00107dfb73..80826394fe8 100644
--- a/mindspore/nn/optim/ftrl.py
+++ b/mindspore/nn/optim/ftrl.py
@@ -114,12 +114,12 @@ class FTRL(Optimizer):
             than or equal to zero. Use fixed learning rate if lr_power is zero. Default: -0.5.
         l1 (float): l1 regularization strength, must be greater than or equal to zero. Default: 0.0.
         l2 (float): l2 regularization strength, must be greater than or equal to zero. Default: 0.0.
-        use_locking (bool): If True use locks for update operation. Default: False.
+        use_locking (bool): If True, use locks for updating operation. Default: False.
         loss_scale (float): Value for the loss scale. It should be equal to or greater than 1.0. Default: 1.0.
         weight_decay (float): Weight decay value to multiply weight, must be zero or positive value. Default: 0.0.
 
     Inputs:
-        - **grads** (tuple[Tensor]) - The gradients of `params` in optimizer, the shape is as same as the `params`
+        - **grads** (tuple[Tensor]) - The gradients of `params` in the optimizer, the shape is the same as the `params`
           in optimizer.
 
     Outputs:
diff --git a/mindspore/nn/optim/lamb.py b/mindspore/nn/optim/lamb.py
index e17e590c21d..aa33c3eeb9c 100755
--- a/mindspore/nn/optim/lamb.py
+++ b/mindspore/nn/optim/lamb.py
@@ -39,8 +39,8 @@ def _update_run_op(beta1, beta2, eps, global_step, lr, weight_decay, param, m, v
     Update parameters.
 
     Args:
-        beta1 (Tensor): The exponential decay rate for the 1st moment estimates. Should be in range (0.0, 1.0).
-        beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
+        beta1 (Tensor): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
+        beta2 (Tensor): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
         eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
         lr (Tensor): Learning rate.
         weight_decay (Number): Weight decay. Should be equal to or greater than 0.
@@ -122,8 +122,8 @@ def _update_run_op_graph_kernel(beta1, beta2, eps, global_step, lr, weight_decay
     Update parameters.
 
     Args:
-        beta1 (Tensor): The exponential decay rate for the 1st moment estimates. Should be in range (0.0, 1.0).
-        beta2 (Tensor): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
+        beta1 (Tensor): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
+        beta2 (Tensor): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
         eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
         lr (Tensor): Learning rate.
         weight_decay (Number): Weight decay. Should be equal to or greater than 0.
@@ -184,7 +184,7 @@ def _check_param_value(beta1, beta2, eps, prim_name):
 
 class Lamb(Optimizer):
     """
-    Lamb Dynamic LR.
+    Lamb Dynamic Learning Rate.
 
     LAMB is an optimization algorithm employing a layerwise adaptive large batch
     optimization technique. Refer to the paper `LARGE BATCH OPTIMIZATION FOR DEEP LEARNING: TRAINING BERT IN 76
@@ -214,16 +214,16 @@ class Lamb(Optimizer):
               the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
               in the value of 'order_params' should be in one of group parameters.
 
-        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
-            When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
             the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
             use dynamic learning rate, the i-th learning rate will be calculated during the process of training
-            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
-            dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
             equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
-        beta1 (float): The exponential decay rate for the 1st moment estimates. Default: 0.9.
+        beta1 (float): The exponential decay rate for the 1st moment estimations. Default: 0.9.
             Should be in range (0.0, 1.0).
-        beta2 (float): The exponential decay rate for the 2nd moment estimates. Default: 0.999.
+        beta2 (float): The exponential decay rate for the 2nd moment estimations. Default: 0.999.
             Should be in range (0.0, 1.0).
         eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6.
             Should be greater than 0.
diff --git a/mindspore/nn/optim/lars.py b/mindspore/nn/optim/lars.py
index 91ca9a4b22a..65187c4cc90 100755
--- a/mindspore/nn/optim/lars.py
+++ b/mindspore/nn/optim/lars.py
@@ -58,12 +58,12 @@ class LARS(Optimizer):
         epsilon (float): Term added to the denominator to improve numerical stability. Default: 1e-05.
         coefficient (float): Trust coefficient for calculating the local learning rate. Default: 0.001.
         use_clip (bool): Whether to use clip operation for calculating the local learning rate. Default: False.
-        lars_filter (Function): A function to determine whether apply lars algorithm. Default:
+        lars_filter (Function): A function to determine whether apply the LARS algorithm. Default:
                                 lambda x: 'LayerNorm' not in x.name and 'bias' not in x.name.
 
     Inputs:
-        - **gradients** (tuple[Tensor]) - The gradients of `params` in optimizer, the shape is
-          as same as the `params` in optimizer.
+        - **gradients** (tuple[Tensor]) - The gradients of `params` in the optimizer, the shape is the
+          as same as the `params` in the optimizer.
 
     Outputs:
         Union[Tensor[bool], tuple[Parameter]], it depends on the output of `optimizer`.
diff --git a/mindspore/nn/optim/lazyadam.py b/mindspore/nn/optim/lazyadam.py
index 788c68a2a8d..a39160d2e9d 100644
--- a/mindspore/nn/optim/lazyadam.py
+++ b/mindspore/nn/optim/lazyadam.py
@@ -127,26 +127,26 @@ class LazyAdam(Optimizer):
               the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
               in the value of 'order_params' should be in one of group parameters.
 
-        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
-            When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
             the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
             use dynamic learning rate, the i-th learning rate will be calculated during the process of training
-            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
-            dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
             equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
             Default: 1e-3.
-        beta1 (float): The exponential decay rate for the 1st moment estimates. Should be in range (0.0, 1.0). Default:
-                       0.9.
-        beta2 (float): The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0). Default:
-                       0.999.
+        beta1 (float): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
+                       Default: 0.9.
+        beta2 (float): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
+                       Default: 0.999.
         eps (float): Term added to the denominator to improve numerical stability. Should be greater than 0. Default:
                      1e-8.
         use_locking (bool): Whether to enable a lock to protect updating variable tensors.
-            If True, updating of the var, m, and v tensors will be protected by a lock.
-            If False, the result is unpredictable. Default: False.
+            If true, updates of the var, m, and v tensors will be protected by a lock.
+            If false, the result is unpredictable. Default: False.
         use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.
-            If True, updates the gradients using NAG.
-            If False, updates the gradients without using NAG. Default: False.
+            If true, update the gradients using NAG.
+            If true, update the gradients without using NAG. Default: False.
         weight_decay (float): Weight decay (L2 penalty). Default: 0.0.
         loss_scale (float): A floating point value for the loss scale. Should be equal to or greater than 1. Default:
                             1.0.
diff --git a/mindspore/nn/optim/momentum.py b/mindspore/nn/optim/momentum.py
index 7781e52d57c..6b501232c89 100755
--- a/mindspore/nn/optim/momentum.py
+++ b/mindspore/nn/optim/momentum.py
@@ -83,12 +83,12 @@ class Momentum(Optimizer):
               the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
               in the value of 'order_params' should be in one of group parameters.
 
-        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
-            When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
             the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
             use dynamic learning rate, the i-th learning rate will be calculated during the process of training
-            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
-            dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
             equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
         momentum (float): Hyperparameter of type float, means momentum for the moving average.
             It should be at least 0.0.
diff --git a/mindspore/nn/optim/optimizer.py b/mindspore/nn/optim/optimizer.py
index e4990608135..7cbc9fa0977 100755
--- a/mindspore/nn/optim/optimizer.py
+++ b/mindspore/nn/optim/optimizer.py
@@ -40,8 +40,6 @@ class Optimizer(Cell):
     """
     Base class for all optimizers.
 
-    This class defines the API to add Ops to train a model.
-
     Note:
         This class defines the API to add Ops to train a model. Never use
         this class directly, but instead instantiate one of its subclasses.
@@ -55,12 +53,12 @@ class Optimizer(Cell):
         To improve parameter groups performance, the customized order of parameters can be supported.
 
     Args:
-        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning
-            rate. When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning
+            rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
             the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
             use dynamic learning rate, the i-th learning rate will be calculated during the process of training
-            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
-            dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
             equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
         parameters (Union[list[Parameter], list[dict]]): When the `parameters` is a list of `Parameter` which will be
             updated, the element in `parameters` should be class `Parameter`. When the `parameters` is a list of `dict`,
@@ -84,8 +82,8 @@ class Optimizer(Cell):
             type of `loss_scale` input is int, it will be converted to float. Default: 1.0.
 
     Raises:
-        ValueError: If the learning_rate is a Tensor, but the dims of tensor is greater than 1.
-        TypeError: If the learning_rate is not any of the three types: float, Tensor, Iterable.
+        ValueError: If the learning_rate is a Tensor, but the dimension of tensor is greater than 1.
+        TypeError: If the learning_rate is not any of the three types: float, Tensor, nor Iterable.
     """
 
     def __init__(self, learning_rate, parameters, weight_decay=0.0, loss_scale=1.0):
@@ -179,7 +177,7 @@ class Optimizer(Cell):
         An approach to reduce the overfitting of a deep learning neural network model.
 
         Args:
-            gradients (tuple[Tensor]): The gradients of `self.parameters`, and have the same shape with
+            gradients (tuple[Tensor]): The gradients of `self.parameters`, and have the same shape as
                 `self.parameters`.
 
         Returns:
@@ -204,7 +202,7 @@ class Optimizer(Cell):
         network.
 
         Args:
-            gradients (tuple[Tensor]): The gradients of `self.parameters`, and have the same shape with
+            gradients (tuple[Tensor]): The gradients of `self.parameters`, and have the same shape as
                 `self.parameters`.
 
         Returns:
diff --git a/mindspore/nn/optim/proximal_ada_grad.py b/mindspore/nn/optim/proximal_ada_grad.py
index c21874c8fd6..12287c59d02 100644
--- a/mindspore/nn/optim/proximal_ada_grad.py
+++ b/mindspore/nn/optim/proximal_ada_grad.py
@@ -87,22 +87,22 @@ class ProximalAdagrad(Optimizer):
               in the value of 'order_params' should be in one of group parameters.
 
         accum (float): The starting value for accumulators, must be zero or positive values. Default: 0.1.
-        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
-            When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
             the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
             use dynamic learning rate, the i-th learning rate will be calculated during the process of training
-            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
-            dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
             equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
             Default: 0.001.
         l1 (float): l1 regularization strength, must be greater than or equal to zero. Default: 0.0.
         l2 (float): l2 regularization strength, must be greater than or equal to zero. Default: 0.0.
-        use_locking (bool): If True use locks for update operation. Default: False.
+        use_locking (bool): If True, use locks for updating operation. Default: False.
         loss_scale (float): Value for the loss scale. It should be greater than 0.0. Default: 1.0.
         weight_decay (float): Weight decay value to multiply weight, must be zero or positive value. Default: 0.0.
 
     Inputs:
-        - **grads** (tuple[Tensor]) - The gradients of `params` in optimizer, the shape is as same as the `params`
+        - **grads** (tuple[Tensor]) - The gradients of `params` in the optimizer, the shape is the same as the `params`
           in optimizer.
 
     Outputs:
diff --git a/mindspore/nn/optim/rmsprop.py b/mindspore/nn/optim/rmsprop.py
index fc5ebc8df92..c646a790bf5 100644
--- a/mindspore/nn/optim/rmsprop.py
+++ b/mindspore/nn/optim/rmsprop.py
@@ -106,12 +106,12 @@ class RMSProp(Optimizer):
               the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
               in the value of 'order_params' should be in one of group parameters.
 
-        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
-            When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
             the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
             use dynamic learning rate, the i-th learning rate will be calculated during the process of training
-            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
-            dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
             equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
             Default: 0.1.
         decay (float): Decay rate. Should be equal to or greater than 0. Default: 0.9.
diff --git a/mindspore/nn/optim/sgd.py b/mindspore/nn/optim/sgd.py
index e684fae22fc..216f2112f36 100755
--- a/mindspore/nn/optim/sgd.py
+++ b/mindspore/nn/optim/sgd.py
@@ -78,12 +78,12 @@ class SGD(Optimizer):
               the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
               in the value of 'order_params' should be in one of group parameters.
 
-        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or graph for the learning rate.
-            When the learning_rate is a Iterable or a Tensor with dimension of 1, use dynamic learning rate, then
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
             the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
             use dynamic learning rate, the i-th learning rate will be calculated during the process of training
-            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor with
-            dimension of 0, use fixed learning rate. Other cases are not supported. The float learning rate should be
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate should be
             equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
             Default: 0.1.
         momentum (float): A floating point value the momentum. should be at least 0.0. Default: 0.0.
diff --git a/mindspore/nn/wrap/cell_wrapper.py b/mindspore/nn/wrap/cell_wrapper.py
index 980585e2700..9a3539373e7 100644
--- a/mindspore/nn/wrap/cell_wrapper.py
+++ b/mindspore/nn/wrap/cell_wrapper.py
@@ -138,9 +138,9 @@ class TrainOneStepCell(Cell):
     r"""
     Network training package class.
 
-    Wraps the network with an optimizer. The resulting Cell be trained with input *inputs.
-    Backward graph will be created in the construct function to do parameter updating. Different
-    parallel modes are available to run the training.
+    Wraps the network with an optimizer. The resulting Cell is trained with input *inputs.
+    The backward graph will be created in the construct function to update the parameter. Different
+    parallel modes are available for training.
 
     Args:
         network (Cell): The training network.
@@ -231,14 +231,14 @@ class DataWrapper(Cell):
 
 class GetNextSingleOp(Cell):
     """
-    Cell to run get next operation.
+    Cell to run for getting the next operation.
 
     Args:
         dataset_types (list[:class:`mindspore.dtype`]): The types of dataset.
         dataset_shapes (list[tuple[int]]): The shapes of dataset.
         queue_name (str): Queue name to fetch the data.
 
-    Detailed information, please refer to `ops.operations.GetNext`.
+    For detailed information, refer to `ops.operations.GetNext`.
     """
 
     def __init__(self, dataset_types, dataset_shapes, queue_name):
@@ -360,7 +360,7 @@ class ParameterUpdate(Cell):
         param (Parameter): The parameter to be updated manually.
 
     Raises:
-        KeyError: If parameter with the specified name do not exist.
+        KeyError: If parameter with the specified name does not exist.
 
     Examples:
         >>> network = Net()
diff --git a/mindspore/nn/wrap/grad_reducer.py b/mindspore/nn/wrap/grad_reducer.py
index 66543a16259..68f676ec664 100644
--- a/mindspore/nn/wrap/grad_reducer.py
+++ b/mindspore/nn/wrap/grad_reducer.py
@@ -329,7 +329,7 @@ class DistributedGradReducer(Cell):
 
     def construct(self, grads):
         """
-        In some circumstances, the data precision of grads could be mixed with float16 and float32. Thus, the
+        Under certain circumstances, the data precision of grads could be mixed with float16 and float32. Thus, the
         result of AllReduce is unreliable. To solve the problem, grads should be cast to float32 before AllReduce,
         and cast back after the operation.
 
diff --git a/mindspore/nn/wrap/loss_scale.py b/mindspore/nn/wrap/loss_scale.py
index 08ff30b4b49..3bfc7170f10 100644
--- a/mindspore/nn/wrap/loss_scale.py
+++ b/mindspore/nn/wrap/loss_scale.py
@@ -54,8 +54,8 @@ class DynamicLossScaleUpdateCell(Cell):
     Dynamic Loss scale update cell.
 
     For loss scaling training, the initial loss scaling value will be set to be `loss_scale_value`.
-    In every training step, the loss scaling value  will be updated by loss scaling value/`scale_factor`
-    when there is overflow. And it will be increased by loss scaling value * `scale_factor` if there is no
+    In each training step, the loss scaling value  will be updated by loss scaling value/`scale_factor`
+    when there is an overflow. And it will be increased by loss scaling value * `scale_factor` if there is no
     overflow for a continuous `scale_window` steps. This cell is used for Graph mode training in which all
     logic will be executed on device side(Another training mode is normal(non-sink) mode in which some logic will be
     executed on host).
@@ -133,7 +133,7 @@ class FixedLossScaleUpdateCell(Cell):
     """
     Static scale update cell, the loss scaling value will not be updated.
 
-    For usage please refer to `DynamicLossScaleUpdateCell`.
+    For usage, refer to `DynamicLossScaleUpdateCell`.
 
     Args:
         loss_scale_value (float): Init loss scale.
diff --git a/mindspore/ops/composite/multitype_ops/getitem_impl.py b/mindspore/ops/composite/multitype_ops/getitem_impl.py
index ffd5ea4d62b..48e4c71ca69 100644
--- a/mindspore/ops/composite/multitype_ops/getitem_impl.py
+++ b/mindspore/ops/composite/multitype_ops/getitem_impl.py
@@ -57,7 +57,7 @@ class _TupleGetItemTensor(base.TupleGetItemTensor_):
         data (tuple): A tuple of items.
         index (Tensor): The index in tensor.
     Outputs:
-        Type, is same as the element type of data.
+        Type, is the same as the element type of data.
     """
 
     def __init__(self, name):
@@ -81,7 +81,7 @@ def _tuple_getitem_by_number(data, number_index):
         number_index (Number): Index in scalar.
 
     Outputs:
-        Type, is same as the element type of data.
+        Type, is the same as the element type of data.
     """
     return F.tuple_getitem(data, number_index)
 
@@ -96,7 +96,7 @@ def _tuple_getitem_by_slice(data, slice_index):
         slice_index (Slice): Index in slice.
 
     Outputs:
-        Tuple, element type is same as the element type of data.
+        Tuple, element type is the same as the element type of data.
     """
     return _tuple_slice(data, slice_index)
 
@@ -111,7 +111,7 @@ def _tuple_getitem_by_tensor(data, tensor_index):
         tensor_index (Tensor): Index to select item.
 
     Outputs:
-        Type, is same as the element type of data.
+        Type, is the same as the element type of data.
     """
     return _tuple_get_item_tensor(data, tensor_index)
 
@@ -126,7 +126,7 @@ def _list_getitem_by_number(data, number_index):
         number_index (Number): Index in scalar.
 
     Outputs:
-        Type is same as the element type of data.
+        Type is the same as the element type of data.
     """
     return F.list_getitem(data, number_index)
 
@@ -186,7 +186,7 @@ def _tensor_getitem_by_slice(data, slice_index):
         slice_index (Slice): Index in slice.
 
     Outputs:
-        Tensor, element type is same as the element type of data.
+        Tensor, element type is the same as the element type of data.
     """
     return compile_utils.tensor_index_by_slice(data, slice_index)
 
@@ -201,7 +201,7 @@ def _tensor_getitem_by_tensor(data, tensor_index):
         tensor_index (Tensor): An index expressed by tensor.
 
     Outputs:
-        Tensor, element type is same as the element type of data.
+        Tensor, element type is the same as the element type of data.
     """
     return compile_utils.tensor_index_by_tensor(data, tensor_index)
 
@@ -216,7 +216,7 @@ def _tensor_getitem_by_tuple(data, tuple_index):
         tuple_index (tuple): Index in tuple.
 
     Outputs:
-        Tensor, element type is same as the element type of data.
+        Tensor, element type is the same as the element type of data.
     """
     return compile_utils.tensor_index_by_tuple(data, tuple_index)
 
diff --git a/mindspore/ops/composite/multitype_ops/setitem_impl.py b/mindspore/ops/composite/multitype_ops/setitem_impl.py
index 30c943b69f9..c1c0e941a65 100644
--- a/mindspore/ops/composite/multitype_ops/setitem_impl.py
+++ b/mindspore/ops/composite/multitype_ops/setitem_impl.py
@@ -32,7 +32,7 @@ def _list_setitem_with_string(data, number_index, value):
         number_index (Number): Index of data.
 
     Outputs:
-        list, type is same as the element type of data.
+        list, type is the same as the element type of data.
     """
     return F.list_setitem(data, number_index, value)
 
@@ -48,7 +48,7 @@ def _list_setitem_with_number(data, number_index, value):
         value (Number): Value given.
 
     Outputs:
-        list, type is same as the element type of data.
+        list, type is the same as the element type of data.
     """
     return F.list_setitem(data, number_index, value)
 
@@ -64,7 +64,7 @@ def _list_setitem_with_Tensor(data, number_index, value):
         value (Tensor): Value given.
 
     Outputs:
-        list, type is same as the element type of data.
+        list, type is the same as the element type of data.
     """
     return F.list_setitem(data, number_index, value)
 
@@ -80,7 +80,7 @@ def _list_setitem_with_List(data, number_index, value):
         value (list): Value given.
 
     Outputs:
-        list, type is same as the element type of data.
+        list, type is the same as the element type of data.
     """
     return F.list_setitem(data, number_index, value)
 
@@ -96,7 +96,7 @@ def _list_setitem_with_Tuple(data, number_index, value):
         value (list): Value given.
 
     Outputs:
-        list, type is same as the element type of data.
+        list, type is the same as the element type of data.
     """
     return F.list_setitem(data, number_index, value)
 
diff --git a/mindspore/ops/operations/_inner_ops.py b/mindspore/ops/operations/_inner_ops.py
index a691725719c..8d90c487e88 100644
--- a/mindspore/ops/operations/_inner_ops.py
+++ b/mindspore/ops/operations/_inner_ops.py
@@ -158,18 +158,18 @@ class ExtractImagePatches(PrimitiveWithInfer):
     The input tensor must be a 4-D tensor and the data format is NHWC.
 
     Args:
-        ksizes (Union[tuple[int], list[int]]): The size of sliding window, should be a tuple or list of int,
+        ksizes (Union[tuple[int], list[int]]): The size of sliding window, should be a tuple or a list of integers,
             and the format is [1, ksize_row, ksize_col, 1].
         strides (Union[tuple[int], list[int]]): Distance between the centers of the two consecutive patches,
             should be a tuple or list of int, and the format is [1, stride_row, stride_col, 1].
-        rates (Union[tuple[int], list[int]]): In each extracted patch, the gap between the corresponding dim
-            pixel positions, should be a tuple or list of int, and the format is [1, rate_row, rate_col, 1].
+        rates (Union[tuple[int], list[int]]): In each extracted patch, the gap between the corresponding dimension
+            pixel positions, should be a tuple or a list of integers, and the format is [1, rate_row, rate_col, 1].
         padding (str): The type of padding algorithm, is a string whose value is "same" or "valid",
             not case sensitive. Default: "valid".
 
             - same: Means that the patch can take the part beyond the original image, and this part is filled with 0.
 
-            - valid: Means that the patch area taken must be completely contained in the original image.
+            - valid: Means that the taken patch area must be completely covered in the original image.
 
     Inputs:
         - **input_x** (Tensor) - A 4-D tensor whose shape is [in_batch, in_row, in_col, in_depth] and
@@ -177,7 +177,7 @@ class ExtractImagePatches(PrimitiveWithInfer):
 
     Outputs:
         Tensor, a 4-D tensor whose data type is same as 'input_x',
-        and the shape is [out_batch, out_row, out_col, out_depth], the out_batch is same as the in_batch.
+        and the shape is [out_batch, out_row, out_col, out_depth], the out_batch is the same as the in_batch.
     """
 
     @prim_attr_register
@@ -436,8 +436,8 @@ class MatrixDiag(PrimitiveWithInfer):
     Returns a batched diagonal tensor with a given batched diagonal values.
 
     Inputs:
-        - **x** (Tensor) - A tensor which to be element-wise multi by `assist`. It can be of the following data types:
-          float32, float16, int32, int8, uint8.
+        - **x** (Tensor) - A tensor which to be element-wise multi by `assist`. It can be one of the following data
+          types: float32, float16, int32, int8, and uint8.
         - **assist** (Tensor) - A eye tensor of the same type as `x`. It's rank must greater than or equal to 2 and
           it's last dimension must equal to the second to last dimension.
 
@@ -490,7 +490,7 @@ class MatrixDiagPart(PrimitiveWithInfer):
     Returns the batched diagonal part of a batched tensor.
 
     Inputs:
-        - **x** (Tensor) - The batched tensor. It can be of the following data types:
+        - **x** (Tensor) - The batched tensor. It can be one of the following data types:
           float32, float16, int32, int8, uint8.
         - **assist** (Tensor) - A eye tensor of the same type as `x`. With shape same as `x`.
 
@@ -531,7 +531,7 @@ class MatrixSetDiag(PrimitiveWithInfer):
     Modify the batched diagonal part of a batched tensor.
 
     Inputs:
-        - **x** (Tensor) - The batched tensor. It can be of the following data types:
+        - **x** (Tensor) - The batched tensor. It can be one of the following data types:
           float32, float16, int32, int8, uint8.
         - **assist** (Tensor) - A eye tensor of the same type as `x`. With shape same as `x`.
         - **diagonal** (Tensor) - The diagonal values.
diff --git a/mindspore/ops/operations/_quant_ops.py b/mindspore/ops/operations/_quant_ops.py
index d34e322bb5d..21752c461d7 100644
--- a/mindspore/ops/operations/_quant_ops.py
+++ b/mindspore/ops/operations/_quant_ops.py
@@ -178,8 +178,8 @@ class FakeQuantPerLayer(PrimitiveWithInfer):
         quant_delay (int): Quantilization delay parameter. Before delay step in training time not update
             simulate quantization aware funcion. After delay step in training time begin simulate the aware
             quantize funcion. Default: 0.
-        symmetric (bool): Quantization algorithm use symmetric or not. Default: False.
-        narrow_range (bool): Quantization algorithm use narrow range or not. Default: False.
+        symmetric (bool): Whether the quantization algorithm is symmetric or not. Default: False.
+        narrow_range (bool): Whether the quantization algorithm uses narrow range or not. Default: False.
         training (bool): Training the network or not. Default: True.
 
     Inputs:
@@ -318,8 +318,8 @@ class FakeQuantPerChannel(PrimitiveWithInfer):
         quant_delay (int): Quantilization delay  parameter. Before delay step in training time not
             update the weight data to simulate quantize operation. After delay step in training time
             begin simulate the quantize operation. Default: 0.
-        symmetric (bool): Quantization algorithm use symmetric or not. Default: False.
-        narrow_range (bool): Quantization algorithm use narrow range or not. Default: False.
+        symmetric (bool): Whether the quantization algorithm is symmetric or not. Default: False.
+        narrow_range (bool): Whether the quantization algorithm uses narrow range or not. Default: False.
         training (bool): Training the network or not. Default: True.
         channel_axis (int): Quantization by channel axis. Ascend backend only supports 0 or 1. Default: 1.
 
diff --git a/mindspore/ops/operations/array_ops.py b/mindspore/ops/operations/array_ops.py
index 569a9c2adf4..fcc451323f0 100644
--- a/mindspore/ops/operations/array_ops.py
+++ b/mindspore/ops/operations/array_ops.py
@@ -3359,7 +3359,7 @@ class InplaceUpdate(PrimitiveWithInfer):
         indices (Union[int, tuple]): Indices into the left-most dimension of `x`.
 
     Inputs:
-        - **x** (Tensor) - A tensor which to be inplace updated. It can be of the following data types:
+        - **x** (Tensor) - A tensor which to be inplace updated. It can be one of the following data types:
           float32, float16, int32.
         - **v** (Tensor) - A tensor of the same type as `x`. Same dimension size as `x` except
           the first dimension, which must be the same as the size of `indices`.
@@ -3474,7 +3474,7 @@ class TransShape(PrimitiveWithInfer):
         - **out_shape** (tuple[int]) - The shape of output data.
 
     Outputs:
-        Tensor, a tensor whose data type is same as 'input_x', and the shape is same as the `out_shape`.
+        Tensor, a tensor whose data type is same as 'input_x', and the shape is the same as the `out_shape`.
     """
     @prim_attr_register
     def __init__(self):
diff --git a/mindspore/ops/operations/inner_ops.py b/mindspore/ops/operations/inner_ops.py
index ad2ae3e955f..825fc99c110 100644
--- a/mindspore/ops/operations/inner_ops.py
+++ b/mindspore/ops/operations/inner_ops.py
@@ -31,7 +31,7 @@ class ScalarCast(PrimitiveWithInfer):
         - **input_y** (mindspore.dtype) - The type should cast to be. Only constant value is allowed.
 
     Outputs:
-        Scalar. The type is same as the python type corresponding to `input_y`.
+        Scalar. The type is the same as the python type corresponding to `input_y`.
 
     Examples:
         >>> scalar_cast = P.ScalarCast()
diff --git a/mindspore/ops/operations/math_ops.py b/mindspore/ops/operations/math_ops.py
index 572556567da..ae16a4e92cc 100644
--- a/mindspore/ops/operations/math_ops.py
+++ b/mindspore/ops/operations/math_ops.py
@@ -132,7 +132,7 @@ class TensorAdd(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1067,7 +1067,7 @@ class Sub(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1105,7 +1105,7 @@ class Mul(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1144,7 +1144,7 @@ class SquaredDifference(_MathBinaryOp):
           float16, float32, int32 or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1333,7 +1333,7 @@ class Pow(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1618,7 +1618,7 @@ class Minimum(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1656,7 +1656,7 @@ class Maximum(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1694,7 +1694,7 @@ class RealDiv(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1733,7 +1733,7 @@ class Div(_MathBinaryOp):
           is a number or a bool, the second input should be a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Raises:
@@ -1772,7 +1772,7 @@ class DivNoNan(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Raises:
@@ -1814,7 +1814,7 @@ class FloorDiv(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1844,7 +1844,7 @@ class TruncateDiv(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1873,7 +1873,7 @@ class TruncateMod(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -1900,7 +1900,7 @@ class Mod(_MathBinaryOp):
           the second input should be a tensor whose data type is number.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Raises:
@@ -1967,7 +1967,7 @@ class FloorMod(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -2025,7 +2025,7 @@ class Xdivy(_MathBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is float16, float32 or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -2059,7 +2059,7 @@ class Xlogy(_MathBinaryOp):
           The value must be positive.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,
+        Tensor, the shape is the same as the shape after broadcasting,
         and the data type is the one with high precision or high digits among the two inputs.
 
     Examples:
@@ -2219,7 +2219,7 @@ class Equal(_LogicBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,and the data type is bool.
+        Tensor, the shape is the same as the shape after broadcasting,and the data type is bool.
 
     Examples:
         >>> input_x = Tensor(np.array([1, 2, 3]), mindspore.float32)
@@ -2250,7 +2250,7 @@ class ApproximateEqual(_LogicBinaryOp):
         - **x2** (Tensor) - A tensor of the same type and shape as 'x1'.
 
     Outputs:
-        Tensor, the shape is same as the shape of 'x1', and the data type is bool.
+        Tensor, the shape is the same as the shape of 'x1', and the data type is bool.
 
     Examples:
         >>> x1 = Tensor(np.array([1, 2, 3]), mindspore.float32)
@@ -2328,7 +2328,7 @@ class NotEqual(_LogicBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,and the data type is bool.
+        Tensor, the shape is the same as the shape after broadcasting,and the data type is bool.
 
     Examples:
         >>> input_x = Tensor(np.array([1, 2, 3]), mindspore.float32)
@@ -2364,7 +2364,7 @@ class Greater(_LogicBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,and the data type is bool.
+        Tensor, the shape is the same as the shape after broadcasting,and the data type is bool.
 
     Examples:
         >>> input_x = Tensor(np.array([1, 2, 3]), mindspore.int32)
@@ -2399,7 +2399,7 @@ class GreaterEqual(_LogicBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,and the data type is bool.
+        Tensor, the shape is the same as the shape after broadcasting,and the data type is bool.
 
     Examples:
         >>> input_x = Tensor(np.array([1, 2, 3]), mindspore.int32)
@@ -2434,7 +2434,7 @@ class Less(_LogicBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,and the data type is bool.
+        Tensor, the shape is the same as the shape after broadcasting,and the data type is bool.
 
     Examples:
         >>> input_x = Tensor(np.array([1, 2, 3]), mindspore.int32)
@@ -2469,7 +2469,7 @@ class LessEqual(_LogicBinaryOp):
           a bool when the first input is a tensor or a tensor whose data type is number or bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,and the data type is bool.
+        Tensor, the shape is the same as the shape after broadcasting,and the data type is bool.
 
     Examples:
         >>> input_x = Tensor(np.array([1, 2, 3]), mindspore.int32)
@@ -2495,7 +2495,7 @@ class LogicalNot(PrimitiveWithInfer):
         - **input_x** (Tensor) - The input tensor whose dtype is bool.
 
     Outputs:
-        Tensor, the shape is same as the `input_x`, and the dtype is bool.
+        Tensor, the shape is the same as the `input_x`, and the dtype is bool.
 
     Examples:
         >>> input_x = Tensor(np.array([True, False, True]), mindspore.bool_)
@@ -2533,7 +2533,7 @@ class LogicalAnd(_LogicBinaryOp):
           a tensor whose data type is bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting, and the data type is bool.
+        Tensor, the shape is the same as the shape after broadcasting, and the data type is bool.
 
     Examples:
         >>> input_x = Tensor(np.array([True, False, True]), mindspore.bool_)
@@ -2563,7 +2563,7 @@ class LogicalOr(_LogicBinaryOp):
           a tensor whose data type is bool.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,and the data type is bool.
+        Tensor, the shape is the same as the shape after broadcasting,and the data type is bool.
 
     Examples:
         >>> input_x = Tensor(np.array([True, False, True]), mindspore.bool_)
@@ -3182,7 +3182,7 @@ class Atan2(_MathBinaryOp):
         - **input_y** (Tensor) - The input tensor.
 
     Outputs:
-        Tensor, the shape is same as the shape after broadcasting,and the data type is same as `input_x`.
+        Tensor, the shape is the same as the shape after broadcasting,and the data type is same as `input_x`.
 
     Examples:
          >>> input_x = Tensor(np.array([[0, 1]]), mindspore.float32)
diff --git a/mindspore/ops/operations/nn_ops.py b/mindspore/ops/operations/nn_ops.py
index aaa24c1572a..50316fab8f8 100644
--- a/mindspore/ops/operations/nn_ops.py
+++ b/mindspore/ops/operations/nn_ops.py
@@ -100,7 +100,7 @@ class Softmax(PrimitiveWithInfer):
     Softmax operation.
 
     Applies the Softmax operation to the input tensor on the specified axis.
-    Suppose a slice along the given aixs :math:`x` then for each element :math:`x_i`
+    Suppose a slice in the given aixs :math:`x` then for each element :math:`x_i`
     the Softmax function is shown as follows:
 
     .. math::
@@ -151,7 +151,7 @@ class LogSoftmax(PrimitiveWithInfer):
     Log Softmax activation function.
 
     Applies the Log Softmax function to the input tensor on the specified axis.
-    Suppose a slice along the given aixs :math:`x` then for each element :math:`x_i`
+    Suppose a slice in the given aixs :math:`x` then for each element :math:`x_i`
     the Log Softmax function is shown as follows:
 
     .. math::
@@ -429,7 +429,7 @@ class HSwish(PrimitiveWithInfer):
     .. math::
         \text{hswish}(x_{i}) = x_{i} * \frac{ReLU6(x_{i} + 3)}{6},
 
-    where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
+    where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
 
     Inputs:
         - **input_data** (Tensor) - The input of HSwish, data type should be float16 or float32.
@@ -502,7 +502,7 @@ class HSigmoid(PrimitiveWithInfer):
     .. math::
         \text{hsigmoid}(x_{i}) = max(0, min(1, \frac{x_{i} + 3}{6})),
 
-    where :math:`x_{i}` is the :math:`i`-th slice along the given dim of the input Tensor.
+    where :math:`x_{i}` is the :math:`i`-th slice in the given dimension of the input Tensor.
 
     Inputs:
         - **input_data** (Tensor) - The input of HSigmoid, data type should be float16 or float32.
@@ -2234,7 +2234,7 @@ class DropoutDoMask(PrimitiveWithInfer):
           shape of `input_x` must be same as the value of `DropoutGenMask`'s input `shape`. If input wrong `mask`,
           the output of `DropoutDoMask` are unpredictable.
         - **keep_prob** (Tensor) - The keep rate, between 0 and 1, e.g. keep_prob = 0.9,
-          means dropping out 10% of input units. The value of `keep_prob` is same as the input `keep_prob` of
+          means dropping out 10% of input units. The value of `keep_prob` is the same as the input `keep_prob` of
           `DropoutGenMask`.
 
     Outputs:
@@ -2674,9 +2674,9 @@ class Pad(PrimitiveWithInfer):
 
     Args:
         paddings (tuple): The shape of parameter `paddings` is (N, 2). N is the rank of input data. All elements of
-            paddings are int type. For `D` th dimension of input, paddings[D, 0] indicates how many sizes to be
-            extended ahead of the `D` th dimension of the input tensor, and paddings[D, 1] indicates how many sizes to
-            be extended behind of the `D` th dimension of the input tensor.
+            paddings are int type. For the input in `D` th dimension, paddings[D, 0] indicates how many sizes to be
+            extended ahead of the input tensor in the `D` th dimension, and paddings[D, 1] indicates how many sizes to
+            be extended behind of the input tensor in the `D` th dimension.
 
     Inputs:
         - **input_x** (Tensor) - The input tensor.
@@ -2733,9 +2733,9 @@ class MirrorPad(PrimitiveWithInfer):
         - **input_x** (Tensor) - The input tensor.
         - **paddings** (Tensor) - The paddings tensor. The value of `paddings` is a matrix(list),
           and its shape is (N, 2). N is the rank of input data. All elements of paddings
-          are int type. For `D` th dimension of input, paddings[D, 0] indicates how many sizes to be
-          extended ahead of the `D` th dimension of the input tensor, and paddings[D, 1] indicates
-          how many sizes to be extended behind of the `D` th dimension of the input tensor.
+          are int type. For the input in `D` th dimension, paddings[D, 0] indicates how many sizes to be
+          extended ahead of the input tensor in the `D` th dimension, and paddings[D, 1] indicates how many sizes to
+          be extended behind of the input tensor in the `D` th dimension.
 
     Outputs:
         Tensor, the tensor after padding.
@@ -2880,11 +2880,11 @@ class Adam(PrimitiveWithInfer):
 
     Args:
         use_locking (bool): Whether to enable a lock to protect updating variable tensors.
-            If True, updating of the var, m, and v tensors will be protected by a lock.
-            If False, the result is unpredictable. Default: False.
+            If true, updates of the var, m, and v tensors will be protected by a lock.
+            If false, the result is unpredictable. Default: False.
         use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.
-            If True, updates the gradients using NAG.
-            If False, updates the gradients without using NAG. Default: False.
+            If true, update the gradients using NAG.
+            If true, update the gradients without using NAG. Default: False.
 
     Inputs:
         - **var** (Tensor) - Weights to be updated.
@@ -2894,8 +2894,8 @@ class Adam(PrimitiveWithInfer):
         - **beta1_power** (float) - :math:`beta_1^t` in the updating formula.
         - **beta2_power** (float) - :math:`beta_2^t` in the updating formula.
         - **lr** (float) - :math:`l` in the updating formula.
-        - **beta1** (float) - The exponential decay rate for the 1st moment estimates.
-        - **beta2** (float) - The exponential decay rate for the 2nd moment estimates.
+        - **beta1** (float) - The exponential decay rate for the 1st moment estimations.
+        - **beta2** (float) - The exponential decay rate for the 2nd moment estimations.
         - **epsilon** (float) - Term added to the denominator to improve numerical stability.
         - **gradient** (Tensor) - Gradients. Has the same type as `var`.
 
@@ -2974,11 +2974,11 @@ class FusedSparseAdam(PrimitiveWithInfer):
 
     Args:
         use_locking (bool): Whether to enable a lock to protect updating variable tensors.
-            If True, updating of the var, m, and v tensors will be protected by a lock.
-            If False, the result is unpredictable. Default: False.
+            If true, updates of the var, m, and v tensors will be protected by a lock.
+            If false, the result is unpredictable. Default: False.
         use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.
-            If True, updates the gradients using NAG.
-            If False, updates the gradients without using NAG. Default: False.
+            If true, update the gradients using NAG.
+            If true, update the gradients without using NAG. Default: False.
 
     Inputs:
         - **var** (Parameter) - Parameters to be updated. With float32 data type.
@@ -2989,8 +2989,8 @@ class FusedSparseAdam(PrimitiveWithInfer):
         - **beta1_power** (Tensor) - :math:`beta_1^t` in the updating formula. With float32 data type.
         - **beta2_power** (Tensor) - :math:`beta_2^t` in the updating formula. With float32 data type.
         - **lr** (Tensor) - :math:`l` in the updating formula. With float32 data type.
-        - **beta1** (Tensor) - The exponential decay rate for the 1st moment estimates. With float32 data type.
-        - **beta2** (Tensor) - The exponential decay rate for the 2nd moment estimates. With float32 data type.
+        - **beta1** (Tensor) - The exponential decay rate for the 1st moment estimations. With float32 data type.
+        - **beta2** (Tensor) - The exponential decay rate for the 2nd moment estimations. With float32 data type.
         - **epsilon** (Tensor) - Term added to the denominator to improve numerical stability. With float32 data type.
         - **gradient** (Tensor) - Gradient value. With float32 data type.
         - **indices** (Tensor) - Gradient indices. With int32 data type.
@@ -3108,11 +3108,11 @@ class FusedSparseLazyAdam(PrimitiveWithInfer):
 
     Args:
         use_locking (bool): Whether to enable a lock to protect updating variable tensors.
-            If True, updating of the var, m, and v tensors will be protected by a lock.
-            If False, the result is unpredictable. Default: False.
+            If true, updates of the var, m, and v tensors will be protected by a lock.
+            If false, the result is unpredictable. Default: False.
         use_nesterov (bool): Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.
-            If True, updates the gradients using NAG.
-            If False, updates the gradients without using NAG. Default: False.
+            If true, update the gradients using NAG.
+            If true, update the gradients without using NAG. Default: False.
 
     Inputs:
         - **var** (Parameter) - Parameters to be updated. With float32 data type.
@@ -3123,8 +3123,8 @@ class FusedSparseLazyAdam(PrimitiveWithInfer):
         - **beta1_power** (Tensor) - :math:`beta_1^t` in the updating formula. With float32 data type.
         - **beta2_power** (Tensor) - :math:`beta_2^t` in the updating formula. With float32 data type.
         - **lr** (Tensor) - :math:`l` in the updating formula. With float32 data type.
-        - **beta1** (Tensor) - The exponential decay rate for the 1st moment estimates. With float32 data type.
-        - **beta2** (Tensor) - The exponential decay rate for the 2nd moment estimates. With float32 data type.
+        - **beta1** (Tensor) - The exponential decay rate for the 1st moment estimations. With float32 data type.
+        - **beta2** (Tensor) - The exponential decay rate for the 2nd moment estimations. With float32 data type.
         - **epsilon** (Tensor) - Term added to the denominator to improve numerical stability. With float32 data type.
         - **gradient** (Tensor) - Gradient value. With float32 data type.
         - **indices** (Tensor) - Gradient indices. With int32 data type.
@@ -3227,7 +3227,7 @@ class FusedSparseFtrl(PrimitiveWithInfer):
         l2 (float): l2 regularization strength, must be greater than or equal to zero.
         lr_power (float): Learning rate power controls how the learning rate decreases during training,
             must be less than or equal to zero. Use fixed learning rate if `lr_power` is zero.
-        use_locking (bool): Use locks for update operation if True . Default: False.
+        use_locking (bool): Use locks for updating operation if True . Default: False.
 
     Inputs:
         - **var** (Parameter) - The variable to be updated. The data type must be float32.
@@ -3320,7 +3320,7 @@ class FusedSparseProximalAdagrad(PrimitiveWithInfer):
             var = \frac{sign(\text{prox_v})}{1 + lr * l2} * \max(\left| \text{prox_v} \right| - lr * l1, 0)
 
     Args:
-        use_locking (bool): If True, updating of the var and accum tensors will be protected. Default: False.
+        use_locking (bool): If true, updates of the var and accum tensors will be protected. Default: False.
 
     Inputs:
         - **var** (Parameter) - Variable tensor to be updated. The data type must be float32.
@@ -3415,7 +3415,7 @@ class KLDivLoss(PrimitiveWithInfer):
             \end{cases}
 
     Args:
-        reduction (str): Specifies the reduction to apply to the output.
+        reduction (str): Specifies the reduction to be applied to the output.
             Its value should be one of 'none', 'mean', 'sum'. Default: 'mean'.
 
     Inputs:
@@ -3487,7 +3487,7 @@ class BinaryCrossEntropy(PrimitiveWithInfer):
             \end{cases}
 
     Args:
-        reduction (str): Specifies the reduction to apply to the output.
+        reduction (str): Specifies the reduction to be applied to the output.
             Its value should be one of 'none', 'mean', 'sum'. Default: 'mean'.
 
     Inputs:
@@ -3575,9 +3575,9 @@ class ApplyAdaMax(PrimitiveWithInfer):
           With float32 or float16 data type.
         - **lr** (Union[Number, Tensor]) - Learning rate, :math:`l` in the updating formula, should be scalar.
           With float32 or float16 data type.
-        - **beta1** (Union[Number, Tensor]) - The exponential decay rate for the 1st moment estimates,
+        - **beta1** (Union[Number, Tensor]) - The exponential decay rate for the 1st moment estimations,
           should be scalar. With float32 or float16 data type.
-        - **beta2** (Union[Number, Tensor]) - The exponential decay rate for the 2nd moment estimates,
+        - **beta2** (Union[Number, Tensor]) - The exponential decay rate for the 2nd moment estimations,
           should be scalar. With float32 or float16 data type.
         - **epsilon** (Union[Number, Tensor]) - A small value added for numerical stability, should be scalar.
           With float32 or float16 data type.
@@ -3939,7 +3939,7 @@ class SparseApplyAdagrad(PrimitiveWithInfer):
     Args:
         lr (float): Learning rate.
         update_slots (bool): If `True`, `accum` will be updated. Default: True.
-        use_locking (bool): If True, updating of the var and accum tensors will be protected. Default: False.
+        use_locking (bool): If true, updates of the var and accum tensors will be protected. Default: False.
 
     Inputs:
         - **var** (Parameter) - Variable to be updated. The data type must be float16 or float32.
@@ -4099,7 +4099,7 @@ class ApplyProximalAdagrad(PrimitiveWithInfer):
             var = \frac{sign(\text{prox_v})}{1 + lr * l2} * \max(\left| \text{prox_v} \right| - lr * l1, 0)
 
     Args:
-        use_locking (bool): If True, updating of the var and accum tensors will be protected. Default: False.
+        use_locking (bool): If true, updates of the var and accum tensors will be protected. Default: False.
 
     Inputs:
         - **var** (Parameter) - Variable to be updated. The data type should be float16 or float32.
@@ -4195,7 +4195,7 @@ class SparseApplyProximalAdagrad(PrimitiveWithInfer):
             var = \frac{sign(\text{prox_v})}{1 + lr * l2} * \max(\left| \text{prox_v} \right| - lr * l1, 0)
 
     Args:
-        use_locking (bool): If True, updating of the var and accum tensors will be protected. Default: False.
+        use_locking (bool): If true, updates of the var and accum tensors will be protected. Default: False.
 
     Inputs:
         - **var** (Parameter) - Variable tensor to be updated. The data type must be float16 or float32.
@@ -4697,7 +4697,7 @@ class ApplyFtrl(PrimitiveWithInfer):
     Update relevant entries according to the FTRL scheme.
 
     Args:
-        use_locking (bool): Use locks for update operation if True . Default: False.
+        use_locking (bool): Use locks for updating operation if True . Default: False.
 
     Inputs:
         - **var** (Parameter) - The variable to be updated. The data type should be float16 or float32.
@@ -4788,7 +4788,7 @@ class SparseApplyFtrl(PrimitiveWithInfer):
         l2 (float): l2 regularization strength, must be greater than or equal to zero.
         lr_power (float): Learning rate power controls how the learning rate decreases during training,
             must be less than or equal to zero. Use fixed learning rate if `lr_power` is zero.
-        use_locking (bool): Use locks for update operation if True . Default: False.
+        use_locking (bool): Use locks for updating operation if True . Default: False.
 
     Inputs:
         - **var** (Parameter) - The variable to be updated. The data type must be float16 or float32.
@@ -4967,8 +4967,8 @@ class ConfusionMulGrad(PrimitiveWithInfer):
         axis (Union[int, tuple[int], list[int]]): The dimensions to reduce.
             Default:(), reduce all dimensions. Only constant value is allowed.
         keep_dims (bool):
-            - If True, keep these reduced dimensions and the length is 1.
-            - If False, don't keep these dimensions. Default:False.
+            - If true, keep these reduced dimensions and the length is 1.
+            - If false, don't keep these dimensions. Default:False.
 
     Inputs:
         - **input_0** (Tensor) - The input Tensor.
@@ -5094,9 +5094,9 @@ class CTCLoss(PrimitiveWithInfer):
     Calculates the CTC(Connectionist Temporal Classification) loss. Also calculates the gradient.
 
     Args:
-        preprocess_collapse_repeated (bool): If True, repeated labels are collapsed prior to the CTC calculation.
+        preprocess_collapse_repeated (bool): If true, repeated labels are collapsed prior to the CTC calculation.
                                              Default: False.
-        ctc_merge_repeated (bool): If False, during CTC calculation, repeated non-blank labels will not be merged
+        ctc_merge_repeated (bool): If false, during CTC calculation, repeated non-blank labels will not be merged
                                    and are interpreted as individual labels. This is a simplfied version of CTC.
                                    Default: True.
         ignore_longer_outputs_than_inputs (bool): If True, sequences with longer outputs than inputs will be ignored.
@@ -5192,7 +5192,7 @@ class BasicLSTMCell(PrimitiveWithInfer):
         keep_prob (float): If not 1.0, append `Dropout` layer on the outputs of each
             LSTM layer except the last layer. Default 1.0. The range of dropout is [0.0, 1.0].
         forget_bias (float): Add forget bias to forget gate biases in order to decrease former scale. Default to 1.0.
-        state_is_tuple (bool): If True, state is tensor tuple, containing h and c; If False, one tensor,
+        state_is_tuple (bool): If true, state is tensor tuple, containing h and c; If false, one tensor,
           need split first. Default to True.
         activation (str): Activation. Default to "tanh".
 
diff --git a/mindspore/train/quant/quant.py b/mindspore/train/quant/quant.py
index e858bedc630..4bd87d14801 100644
--- a/mindspore/train/quant/quant.py
+++ b/mindspore/train/quant/quant.py
@@ -496,12 +496,11 @@ def convert_quant_network(network,
         per_channel (bool, list or tuple):  Quantization granularity based on layer or on channel. If `True`
             then base on per channel otherwise base on per layer. The first element represent weights
             and second element represent data flow. Default: (False, False)
-        symmetric (bool, list or tuple): Quantization algorithm use symmetric or not. If `True` then base on
+        symmetric (bool, list or tuple): Whether the quantization algorithm is symmetric or not. If `True` then base on
             symmetric otherwise base on asymmetric. The first element represent weights and second
             element represent data flow. Default: (False, False)
-        narrow_range (bool, list or tuple): Quantization algorithm use narrow range or not. If `True` then base
-            on narrow range otherwise base on off narrow range. The first element represent weights and
-            second element represent data flow. Default: (False, False)
+        narrow_range (bool, list or tuple): Whether the quantization algorithm uses narrow range or not.
+            The first element represents weights and the second element represents data flow. Default: (False, False)
 
     Returns:
         Cell, Network which has change to quantization aware training network cell.
diff --git a/mindspore/train/quant/quant_utils.py b/mindspore/train/quant/quant_utils.py
index 1e2481ceaa0..d01c7645191 100644
--- a/mindspore/train/quant/quant_utils.py
+++ b/mindspore/train/quant/quant_utils.py
@@ -31,8 +31,8 @@ def cal_quantization_params(input_min,
         input_max (numpy.ndarray): The dimension of channel or 1.
         data_type (numpy type) : Can ben numpy int8, numpy uint8.
         num_bits (int): Quantization number bit, support 4 and 8bit. Default: 8.
-        symmetric (bool): Quantization algorithm use symmetric or not. Default: False.
-        narrow_range (bool): Quantization algorithm use narrow range or not. Default: False.
+        symmetric (bool): Whether the quantization algorithm is symmetric or not. Default: False.
+        narrow_range (bool): Whether the quantization algorithm uses narrow range or not. Default: False.
 
     Returns:
         scale (numpy.ndarray): quantization param.
diff --git a/setup.py b/setup.py
index f591c61742e..05ae3337dd6 100644
--- a/setup.py
+++ b/setup.py
@@ -34,7 +34,7 @@ pkg_dir = os.path.join(pwd, 'build/package')
 
 
 def _read_file(filename):
-    with open(os.path.join(pwd, filename)) as f:
+    with open(os.path.join(pwd, filename), encoding='UTF-8') as f:
         return f.read()