diff --git a/docs/api/api_python/mindspore/mindspore.Model.rst b/docs/api/api_python/mindspore/mindspore.Model.rst
index 009131515e0..2fe154b3887 100644
--- a/docs/api/api_python/mindspore/mindspore.Model.rst
+++ b/docs/api/api_python/mindspore/mindspore.Model.rst
@@ -24,7 +24,7 @@
           - auto: 为不同处理器设置专家推荐的混合精度等级，如在GPU上设为"O2"，在Ascend上设为"O3"。该设置方式可能在部分场景下不适用，建议用户根据具体的网络模型自定义设置 `amp_level` 。
 
           在GPU上建议使用"O2"，在Ascend上建议使用"O3"。
-          通过 `kwargs` 设置 `keep_batchnorm_fp32` ，可修改BatchNorm的精度策略， `keep_batchnorm_fp32` 必须为bool类型；通过 `kwargs` 设置 `loss_scale_manager` 可修改损失缩放策略，`loss_scale_manager` 必须为 :class:`mindspore.LossScaleManager` 的子类，
+          通过 `kwargs` 设置 `keep_batchnorm_fp32` ，可修改BatchNorm的精度策略， `keep_batchnorm_fp32` 必须为bool类型；通过 `kwargs` 设置 `loss_scale_manager` 可修改损失缩放策略，`loss_scale_manager` 必须为 :class:`mindspore.amp.LossScaleManager` 的子类，
           关于 `amp_level` 详见 `mindpore.build_train_network` 。
 
         - **boost_level** (str) - `mindspore.boost` 的可选参数，为boost模式训练等级。支持["O0", "O1", "O2"]. 默认值："O0"。
diff --git a/docs/api/api_python/nn/mindspore.nn.ASGD.rst b/docs/api/api_python/nn/mindspore.nn.ASGD.rst
index a1c940f49cd..8b0cba9decd 100644
--- a/docs/api/api_python/nn/mindspore.nn.ASGD.rst
+++ b/docs/api/api_python/nn/mindspore.nn.ASGD.rst
@@ -17,7 +17,7 @@ mindspore.nn.ASGD
             \mu_{t} = \frac{1}{\max(1, t - t0)}
         \end{gather*}
     
-    :math:`\lambda` 代表衰减项， :math:`\mu` 和 :math:`\eta` 被跟踪以更新 :math:`ax` 和 :math:`w` ， :math:`t0` 代表开始平均的点， :math:`\α` 代表 :math:`\eta` 更新的系数， :math:`ax` 表示平均参数值， :math:`t` 表示当前步数（step），:math:`g` 表示 `gradients` ， :math:`w` 表示`params` 。
+    :math:`\lambda` 代表衰减项， :math:`\mu` 和 :math:`\eta` 被跟踪以更新 :math:`ax` 和 :math:`w` ， :math:`t0` 代表开始平均的点， :math:`\α` 代表 :math:`\eta` 更新的系数， :math:`ax` 表示平均参数值， :math:`t` 表示当前步数（step），:math:`g` 表示 `gradients` ， :math:`w` 表示 `params` 。
 
     .. note::
         如果参数未分组，则优化器中的 `weight_decay` 将应用于名称中没有"beta"或"gamma"的参数。用户可以对参数进行分组，以更改权重衰减策略。当参数分组时，每个组都可以设置 `weight_decay` ，如果没有，将应用优化器中的 `weight_decay` 。
@@ -40,7 +40,7 @@ mindspore.nn.ASGD
         - **t0** (float) - 开始平均的点。默认值：1e6。
         - **weight_decay** (Union[float, int, Cell]) - 权重衰减（L2 penalty）。默认值：0.0。
 
-        .. include:: mindspore.nn.optim_arg_dynamic_wd.rst
+          .. include:: mindspore.nn.optim_arg_dynamic_wd.rst
 
     输入：
         - **gradients** (tuple[Tensor]) - `params` 的梯度，shape与 `params` 相同。
diff --git a/docs/api/api_python/nn/mindspore.nn.optim_arg_loss_scale.rst b/docs/api/api_python/nn/mindspore.nn.optim_arg_loss_scale.rst
index d9efd3f6505..155c0e462df 100644
--- a/docs/api/api_python/nn/mindspore.nn.optim_arg_loss_scale.rst
+++ b/docs/api/api_python/nn/mindspore.nn.optim_arg_loss_scale.rst
@@ -1 +1 @@
-- **loss_scale** (float) - 梯度缩放系数，必须大于0。如果 `loss_scale` 是整数，它将被转换为浮点数。通常使用默认值，仅当训练时使用了 `FixedLossScaleManager`，且 `FixedLossScaleManager` 的 `drop_overflow_update` 属性配置为False时，此值需要与 `FixedLossScaleManager` 中的 `loss_scale` 相同。有关更多详细信息，请参阅 :class:`mindspore.FixedLossScaleManager`。默认值：1.0。
+- **loss_scale** (float) - 梯度缩放系数，必须大于0。如果 `loss_scale` 是整数，它将被转换为浮点数。通常使用默认值，仅当训练时使用了 `FixedLossScaleManager`，且 `FixedLossScaleManager` 的 `drop_overflow_update` 属性配置为False时，此值需要与 `FixedLossScaleManager` 中的 `loss_scale` 相同。有关更多详细信息，请参阅 :class:`mindspore.amp.FixedLossScaleManager`。默认值：1.0。
diff --git a/docs/api/api_python/ops/mindspore.ops.AlltoAll.rst b/docs/api/api_python/ops/mindspore.ops.AlltoAll.rst
index c200273ad18..7e58c7bbc5a 100644
--- a/docs/api/api_python/ops/mindspore.ops.AlltoAll.rst
+++ b/docs/api/api_python/ops/mindspore.ops.AlltoAll.rst
@@ -1,7 +1,7 @@
 mindspore.ops.AlltoAll
 ======================
 
-.. py:class:: mindspore.ops.AlltoAll(split_count, split_dim, concat_dim, group='hccl_world_group')
+.. py:class:: mindspore.ops.AlltoAll(split_count, split_dim, concat_dim, group=GlobalComm.WORLD_COMM_GROUP)
 
     AlltoAll是一个集合通信函数。
 
diff --git a/docs/api/api_python/ops/mindspore.ops.CropAndResize.rst b/docs/api/api_python/ops/mindspore.ops.CropAndResize.rst
index 406d6237012..f3d63d1a484 100644
--- a/docs/api/api_python/ops/mindspore.ops.CropAndResize.rst
+++ b/docs/api/api_python/ops/mindspore.ops.CropAndResize.rst
@@ -24,3 +24,4 @@ mindspore.ops.CropAndResize
     异常：
         - **TypeError** - 如果 `method` 不是str。
         - **TypeError** - 如果 `extrapolation_value` 不是float，且取值不是"bilinear"、"nearest"或"bilinear_v2"。
+        - **ValueError** - 如果 `method` 不是'bilinear'、 'nearest'或者'bilinear_v2'。
diff --git a/docs/api/api_python/ops/mindspore.ops.EqualCount.rst b/docs/api/api_python/ops/mindspore.ops.EqualCount.rst
index 04411bcd9d1..06337b26960 100644
--- a/docs/api/api_python/ops/mindspore.ops.EqualCount.rst
+++ b/docs/api/api_python/ops/mindspore.ops.EqualCount.rst
@@ -16,4 +16,4 @@ mindspore.ops.EqualCount
 
     异常：
         - **TypeError** - 如果 `x` 或 `y` 不是Tensor。
-        - **ValueError** - 如果 `x` 与`y` 的shape不相等。
+        - **ValueError** - 如果 `x` 与 `y` 的shape不相等。
diff --git a/docs/api/api_python/ops/mindspore.ops.SparseTensorDenseMatmul.rst b/docs/api/api_python/ops/mindspore.ops.SparseTensorDenseMatmul.rst
index 468310cc036..3fd7c233e81 100644
--- a/docs/api/api_python/ops/mindspore.ops.SparseTensorDenseMatmul.rst
+++ b/docs/api/api_python/ops/mindspore.ops.SparseTensorDenseMatmul.rst
@@ -12,7 +12,7 @@ mindspore.ops.SparseTensorDenseMatmul
     输入：
         - **indices** (Tensor) - 二维Tensor，表示元素在稀疏Tensor中的位置。支持int32、int64，每个元素值都应该是非负的。shape是 :math:`(n,2)` 。
         - **values** (Tensor) - 一维Tensor，表示 `indices` 位置上对应的值。支持float16、float32、float64、int32、int64、complex64、complex128。shape应该是 :math:`(n,)` 。
-        - **sparse_shape** (tuple(int)) - 指定稀疏Tensor的shape，由两个正整数组成，表示稀疏Tensor的shape为 :math:`(N, C)` 。
+        - **sparse_shape** (tuple(int) 或 Tensor) - 指定稀疏Tensor的shape，由两个正整数组成，表示稀疏Tensor的shape为 :math:`(N, C)` 。
         - **dense** (Tensor) - 二维Tensor，数据类型与 `values` 相同。
 
           如果 `adjoint_st` 为False， `adjoint_dt` 为False，则shape必须为 :math:`(C, M)` 。
diff --git a/docs/api/api_python/ops/mindspore.ops.SquaredDifference.rst b/docs/api/api_python/ops/mindspore.ops.SquaredDifference.rst
index e7243e6ae9e..8f1acc2d767 100644
--- a/docs/api/api_python/ops/mindspore.ops.SquaredDifference.rst
+++ b/docs/api/api_python/ops/mindspore.ops.SquaredDifference.rst
@@ -5,7 +5,7 @@ mindspore.ops.SquaredDifference
 
     第一个输入Tensor元素中减去第二个输入Tensor，并返回其平方。
 
-    `x` 和` y` 的输入遵循隐式类型转换规则，使数据类型一致。输入必须是两个Tensor或一个Tensor和一个Scalar。当输入是两个Tensor时，它们的数据类型不能同时为bool类型，并且它们的shape可以广播。当输入是一个Tensor和一个Scalar时，Scalar只能是一个常量。
+    `x` 和 `y` 的输入遵循隐式类型转换规则，使数据类型一致。输入必须是两个Tensor或一个Tensor和一个Scalar。当输入是两个Tensor时，它们的数据类型不能同时为bool类型，并且它们的shape可以广播。当输入是一个Tensor和一个Scalar时，Scalar只能是一个常量。
 
     .. math::
         out_{i} = (x_{i} - y_{i}) * (x_{i} - y_{i}) = (x_{i} - y_{i})^2
diff --git a/docs/api/api_python/ops/mindspore.ops.func_assign.rst b/docs/api/api_python/ops/mindspore.ops.func_assign.rst
index 462e9034ad1..8248a0e5c71 100644
--- a/docs/api/api_python/ops/mindspore.ops.func_assign.rst
+++ b/docs/api/api_python/ops/mindspore.ops.func_assign.rst
@@ -5,7 +5,7 @@ mindspore.ops.assign
 
     为网络参数赋值。
 
-    `variable` 和`value` 遵循隐式类型转换规则，使数据类型一致。如果它们具有不同的数据类型，则低精度数据类型将转换为相对最高精度的数据类型。
+    `variable` 和 `value` 遵循隐式类型转换规则，使数据类型一致。如果它们具有不同的数据类型，则低精度数据类型将转换为相对最高精度的数据类型。
 
     参数：
         - **variable** (Parameter) - 网路参数。 :math:`(N,*)` ，其中 :math:`*` 表示任意数量的附加维度，其秩应小于8。
diff --git a/docs/api/api_python/ops/mindspore.ops.func_count_nonzero.rst b/docs/api/api_python/ops/mindspore.ops.func_count_nonzero.rst
index 8ce482686d0..6b717e9562f 100644
--- a/docs/api/api_python/ops/mindspore.ops.func_count_nonzero.rst
+++ b/docs/api/api_python/ops/mindspore.ops.func_count_nonzero.rst
@@ -1,7 +1,7 @@
 mindspore.ops.count_nonzero
 ============================
 
-.. py:function:: mindspore.ops.count_nonzero(x, axis=(), keep_dims=False, dtype=mindspore.int32)
+.. py:function:: mindspore.ops.count_nonzero(x, axis=(), keep_dims=False, dtype=mstype.int32)
 
     计算输入Tensor指定轴上的非零元素的数量。
 
diff --git a/docs/api/api_python/ops/mindspore.ops.custom_info_register.rst b/docs/api/api_python/ops/mindspore.ops.func_custom_info_register.rst
similarity index 87%
rename from docs/api/api_python/ops/mindspore.ops.custom_info_register.rst
rename to docs/api/api_python/ops/mindspore.ops.func_custom_info_register.rst
index 0ae5b5d08a3..21292dcf93e 100644
--- a/docs/api/api_python/ops/mindspore.ops.custom_info_register.rst
+++ b/docs/api/api_python/ops/mindspore.ops.func_custom_info_register.rst
@@ -1,7 +1,7 @@
 mindspore.ops.custom_info_register
 ==================================
 
-.. py:class:: mindspore.ops.custom_info_register(*reg_info)
+.. py:function:: mindspore.ops.custom_info_register(*reg_info)
 
     装饰器，用于将注册信息绑定到： :class:`mindspore.ops.Custom` 的 `func` 参数。
 
diff --git a/docs/api/api_python/ops/mindspore.ops.multinomial.rst b/docs/api/api_python/ops/mindspore.ops.func_multinomial.rst
similarity index 100%
rename from docs/api/api_python/ops/mindspore.ops.multinomial.rst
rename to docs/api/api_python/ops/mindspore.ops.func_multinomial.rst
diff --git a/mindspore/python/mindspore/nn/optim/ada_grad.py b/mindspore/python/mindspore/nn/optim/ada_grad.py
index fdb1c82b0a4..0c175bbe4d3 100644
--- a/mindspore/python/mindspore/nn/optim/ada_grad.py
+++ b/mindspore/python/mindspore/nn/optim/ada_grad.py
@@ -131,7 +131,7 @@ class Adagrad(Optimizer):
         loss_scale (float): Value for the loss scale. It must be greater than 0.0. In general, use the default value.
             Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
         weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.
 
diff --git a/mindspore/python/mindspore/nn/optim/adadelta.py b/mindspore/python/mindspore/nn/optim/adadelta.py
index 83dd72ea50d..044a0441950 100644
--- a/mindspore/python/mindspore/nn/optim/adadelta.py
+++ b/mindspore/python/mindspore/nn/optim/adadelta.py
@@ -113,7 +113,7 @@ class Adadelta(Optimizer):
         loss_scale (float): Value for the loss scale. It must be greater than 0.0. In general, use the default value.
             Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
         weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.
 
diff --git a/mindspore/python/mindspore/nn/optim/adafactor.py b/mindspore/python/mindspore/nn/optim/adafactor.py
index c4e7ecfd287..65b4b5fe57b 100644
--- a/mindspore/python/mindspore/nn/optim/adafactor.py
+++ b/mindspore/python/mindspore/nn/optim/adafactor.py
@@ -232,7 +232,7 @@ class AdaFactor(Optimizer):
         loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
             default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
 
     Inputs:
diff --git a/mindspore/python/mindspore/nn/optim/adam.py b/mindspore/python/mindspore/nn/optim/adam.py
index 7ef7dc84f16..5a9f2012f47 100755
--- a/mindspore/python/mindspore/nn/optim/adam.py
+++ b/mindspore/python/mindspore/nn/optim/adam.py
@@ -441,7 +441,7 @@ class Adam(Optimizer):
         loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
             default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
 
     Inputs:
@@ -902,7 +902,7 @@ class AdamOffload(Optimizer):
         loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
             default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
 
     Inputs:
diff --git a/mindspore/python/mindspore/nn/optim/adamax.py b/mindspore/python/mindspore/nn/optim/adamax.py
index ec0757e0f6f..79a1b1c43bb 100644
--- a/mindspore/python/mindspore/nn/optim/adamax.py
+++ b/mindspore/python/mindspore/nn/optim/adamax.py
@@ -136,7 +136,7 @@ class AdaMax(Optimizer):
         loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
             default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
 
     Inputs:
diff --git a/mindspore/python/mindspore/nn/optim/ftrl.py b/mindspore/python/mindspore/nn/optim/ftrl.py
index ba5720a1316..4821695f7a6 100644
--- a/mindspore/python/mindspore/nn/optim/ftrl.py
+++ b/mindspore/python/mindspore/nn/optim/ftrl.py
@@ -191,7 +191,7 @@ class FTRL(Optimizer):
         loss_scale (float): Value for the loss scale. It must be greater than 0.0. In general, use the default value.
             Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
         weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.
 
diff --git a/mindspore/python/mindspore/nn/optim/lazyadam.py b/mindspore/python/mindspore/nn/optim/lazyadam.py
index 4cc308f5064..acdec290b9b 100644
--- a/mindspore/python/mindspore/nn/optim/lazyadam.py
+++ b/mindspore/python/mindspore/nn/optim/lazyadam.py
@@ -285,7 +285,7 @@ class LazyAdam(Optimizer):
         loss_scale (float): A floating point value for the loss scale. Should be equal to or greater than 1. In general,
             use the default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update`
             in `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
 
     Inputs:
diff --git a/mindspore/python/mindspore/nn/optim/momentum.py b/mindspore/python/mindspore/nn/optim/momentum.py
index 2f193af4147..73e86aa24b7 100755
--- a/mindspore/python/mindspore/nn/optim/momentum.py
+++ b/mindspore/python/mindspore/nn/optim/momentum.py
@@ -145,7 +145,7 @@ class Momentum(Optimizer):
         loss_scale (float): A floating point value for the loss scale. It must be greater than 0.0. In general, use the
             default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
         use_nesterov (bool): Enable Nesterov momentum. Default: False.
 
diff --git a/mindspore/python/mindspore/nn/optim/optimizer.py b/mindspore/python/mindspore/nn/optim/optimizer.py
index 9df817b85ed..0c8ae76b271 100644
--- a/mindspore/python/mindspore/nn/optim/optimizer.py
+++ b/mindspore/python/mindspore/nn/optim/optimizer.py
@@ -125,7 +125,7 @@ class Optimizer(Cell):
             type of `loss_scale` input is int, it will be converted to float. In general, use the default value. Only
             when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
 
     Raises:
diff --git a/mindspore/python/mindspore/nn/optim/proximal_ada_grad.py b/mindspore/python/mindspore/nn/optim/proximal_ada_grad.py
index 7a4bec1345a..febfc056d21 100644
--- a/mindspore/python/mindspore/nn/optim/proximal_ada_grad.py
+++ b/mindspore/python/mindspore/nn/optim/proximal_ada_grad.py
@@ -131,7 +131,7 @@ class ProximalAdagrad(Optimizer):
         loss_scale (float): Value for the loss scale. It must be greater than 0.0. In general, use the default value.
             Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
         weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.
 
diff --git a/mindspore/python/mindspore/nn/optim/rmsprop.py b/mindspore/python/mindspore/nn/optim/rmsprop.py
index 359579b9767..4a3a55ced23 100644
--- a/mindspore/python/mindspore/nn/optim/rmsprop.py
+++ b/mindspore/python/mindspore/nn/optim/rmsprop.py
@@ -145,7 +145,7 @@ class RMSProp(Optimizer):
         loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
             default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
         weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.
 
diff --git a/mindspore/python/mindspore/nn/optim/sgd.py b/mindspore/python/mindspore/nn/optim/sgd.py
index e64523a2aff..9b2c87b890e 100755
--- a/mindspore/python/mindspore/nn/optim/sgd.py
+++ b/mindspore/python/mindspore/nn/optim/sgd.py
@@ -110,7 +110,7 @@ class SGD(Optimizer):
         loss_scale (float): A floating point value for the loss scale, which must be larger than 0.0. In general, use
             the default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
             `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
             Default: 1.0.
 
     Inputs:
diff --git a/mindspore/python/mindspore/nn/transformer/transformer.py b/mindspore/python/mindspore/nn/transformer/transformer.py
index 0c1470f29de..bc7e32cd88e 100644
--- a/mindspore/python/mindspore/nn/transformer/transformer.py
+++ b/mindspore/python/mindspore/nn/transformer/transformer.py
@@ -2518,7 +2518,7 @@ class TransformerDecoder(Cell):
                 'relu6', 'tanh', 'gelu', 'fast_gelu', 'elu', 'sigmoid', 'prelu', 'leakyrelu', 'hswish',
                 'hsigmoid', 'logsigmoid' and so on. Default: gelu.
             lambda_func(function): A function can determine the fusion index,
-            pipeline stages and recompute attribute. If the
+                pipeline stages and recompute attribute. If the
                 user wants to determine the pipeline stage and gradient aggregation fusion, the user can pass a
                 function that accepts `network`, `layer_id`, `offset`, `parallel_config`, `layers`. The `network(Cell)`
                 represents the transformer block, `layer_id(int)` means the layer index for the current module, counts
diff --git a/mindspore/python/mindspore/ops/composite/random_ops.py b/mindspore/python/mindspore/ops/composite/random_ops.py
index 34404df9842..9bedba70d25 100644
--- a/mindspore/python/mindspore/ops/composite/random_ops.py
+++ b/mindspore/python/mindspore/ops/composite/random_ops.py
@@ -349,7 +349,7 @@ def multinomial(inputs, num_sample, replacement=True, seed=None):
         seed (int, optional): Seed is used as entropy source for the random number engines to generate
           pseudo-random numbers, must be non-negative. Default: None.
 
-    Outputs:
+    Returns:
         Tensor, has the same rows with input. The number of sampled indices of each row is `num_samples`.
         The dtype is float32.
 
diff --git a/mindspore/python/mindspore/ops/operations/array_ops.py b/mindspore/python/mindspore/ops/operations/array_ops.py
index b202f5e92ef..dd8390fed37 100755
--- a/mindspore/python/mindspore/ops/operations/array_ops.py
+++ b/mindspore/python/mindspore/ops/operations/array_ops.py
@@ -6895,7 +6895,7 @@ class ExtractVolumePatches(Primitive):
     Supported Platforms:
         ``Ascend`` ``CPU``
 
-    Example:
+    Examples:
         >>> kernel_size = (1, 1, 2, 2, 2)
         >>> strides = (1, 1, 1, 1, 1)
         >>> padding = "VALID"
diff --git a/mindspore/python/mindspore/ops/operations/comm_ops.py b/mindspore/python/mindspore/ops/operations/comm_ops.py
index bfee3417b9b..f28aabede92 100644
--- a/mindspore/python/mindspore/ops/operations/comm_ops.py
+++ b/mindspore/python/mindspore/ops/operations/comm_ops.py
@@ -686,7 +686,7 @@ class NeighborExchange(Primitive):
     Supported Platforms:
         ``Ascend``
 
-    Example:
+    Examples:
         >>> # This example should be run with 2 devices. Refer to the tutorial > Distributed Training on mindspore.cn
         >>> import os
         >>> import mindspore as ms
@@ -762,7 +762,7 @@ class AlltoAll(PrimitiveWithInfer):
     Supported Platforms:
         ``Ascend``
 
-    Example:
+    Examples:
         >>> # This example should be run with 8 devices. Refer to the tutorial > Distributed Training on mindspore.cn
         >>> import os
         >>> import mindspore as ms
diff --git a/mindspore/python/mindspore/ops/operations/math_ops.py b/mindspore/python/mindspore/ops/operations/math_ops.py
index 7071df054e8..19d0d53ddf2 100644
--- a/mindspore/python/mindspore/ops/operations/math_ops.py
+++ b/mindspore/python/mindspore/ops/operations/math_ops.py
@@ -2055,7 +2055,7 @@ class Rsqrt(Primitive):
 
     Inputs:
         - **x** (Tensor) - The input of Rsqrt. Its rank must be in [0, 7] inclusive and
-            each element must be a non-negative number.
+          each element must be a non-negative number.
 
     Outputs:
         Tensor, has the same type and shape as `x`.
@@ -2092,7 +2092,6 @@ class Sqrt(Primitive):
 
         out_{i} =  \sqrt{x_{i}}
 
-
     Inputs:
         - **x** (Tensor) - The input tensor with a dtype of Number, its rank must be in [0, 7] inclusive.
 
diff --git a/mindspore/python/mindspore/ops/operations/sparse_ops.py b/mindspore/python/mindspore/ops/operations/sparse_ops.py
index eab7e7e6b4c..c92119f9c0f 100644
--- a/mindspore/python/mindspore/ops/operations/sparse_ops.py
+++ b/mindspore/python/mindspore/ops/operations/sparse_ops.py
@@ -623,7 +623,7 @@ class SparseTensorDenseMatmul(Primitive):
           Support int32, int64, each element value should be a non-negative int number. The shape is :math:`(n, 2)`.
         - **values** (Tensor) - A 1-D Tensor, represents the value corresponding to the position in the `indices`.
           Support float16, float32, float64, int32, int64, complex64, complex128. The shape should be :math:`(n,)`.
-        - **sparse_shape** (tuple(int)) or (Tensor) - A positive int tuple or tensor which specifies the shape of
+        - **sparse_shape** (tuple(int) or (Tensor)) - A positive int tuple or tensor which specifies the shape of
           sparse tensor, and only constant value is allowed when sparse_shape is a tensor, should have 2 elements,
           represent sparse tensor shape is :math:`(N, C)`.
         - **dense** (Tensor) - A 2-D Tensor, the dtype is same as `values`.
diff --git a/mindspore/python/mindspore/train/amp.py b/mindspore/python/mindspore/train/amp.py
index 2a228c81b0e..21116b0dfcb 100644
--- a/mindspore/python/mindspore/train/amp.py
+++ b/mindspore/python/mindspore/train/amp.py
@@ -281,8 +281,8 @@ def build_train_network(network, optimizer, loss_fn=None, level='O0', boost_leve
         keep_batchnorm_fp32 (bool): Keep Batchnorm run in `float32` when the network is set to cast to `float16` . If
             set, the `level` setting will take no effect on this property.
         loss_scale_manager (Union[None, LossScaleManager]): If not None, must be subclass of
-            :class:`mindspore.LossScaleManager` for scaling the loss. If set, the `level` setting will take no effect
-            on this property.
+            :class:`mindspore.amp.LossScaleManager` for scaling the loss. If set, the `level` setting will
+            take no effect on this property.
 
     Raises:
         ValueError: If device is CPU, property `loss_scale_manager` is not `None` or `FixedLossScaleManager`
diff --git a/mindspore/python/mindspore/train/loss_scale_manager.py b/mindspore/python/mindspore/train/loss_scale_manager.py
index 2d2cff35a91..d0330298b10 100644
--- a/mindspore/python/mindspore/train/loss_scale_manager.py
+++ b/mindspore/python/mindspore/train/loss_scale_manager.py
@@ -29,7 +29,7 @@ class LossScaleManager:
     `get_update_cell` is used to get the instance of :class:`mindspore.nn.Cell` that is used to update the loss scale,
     the instance will be called during the training. Currently, the `get_update_cell` is mostly used.
 
-    For example, :class:`mindspore.FixedLossScaleManager` and :class:`mindspore.DynamicLossScaleManager`.
+    For example, :class:`mindspore.amp.FixedLossScaleManager` and :class:`mindspore.amp.DynamicLossScaleManager`.
     """
     def get_loss_scale(self):
         """Get the value of loss scale, which is the amplification factor of the gradients."""
diff --git a/mindspore/python/mindspore/train/model.py b/mindspore/python/mindspore/train/model.py
index 3372487806d..e0bee88e649 100644
--- a/mindspore/python/mindspore/train/model.py
+++ b/mindspore/python/mindspore/train/model.py
@@ -140,7 +140,7 @@ class Model:
             "O2" is recommended on GPU, "O3" is recommended on Ascend.
             The BatchNorm strategy can be changed by `keep_batchnorm_fp32` settings in `kwargs`. `keep_batchnorm_fp32`
             must be a bool. The loss scale strategy can be changed by `loss_scale_manager` setting in `kwargs`.
-            `loss_scale_manager` should be a subclass of :class:`mindspore.LossScaleManager`.
+            `loss_scale_manager` should be a subclass of :class:`mindspore.amp.LossScaleManager`.
             The more detailed explanation of `amp_level` setting can be found at `mindspore.build_train_network`.
 
         boost_level (str): Option for argument `level` in `mindspore.boost`, level for boost mode