!41677 modify format

Merge pull request !41677 from 俞涵/code_docs_0819
2022-09-09 02:08:11 +00:00 · 2022-09-09 02:08:11 +00:00 · 50ed6ee1d5
parent acff6ed054 4718972c26
commit 50ed6ee1d5
33 changed files with 36 additions and 36 deletions
--- a/docs/api/api_python/mindspore/mindspore.Model.rst
+++ b/docs/api/api_python/mindspore/mindspore.Model.rst
@ -24,7 +24,7 @@
          - auto: 为不同处理器设置专家推荐的混合精度等级，如在GPU上设为"O2"，在Ascend上设为"O3"。该设置方式可能在部分场景下不适用，建议用户根据具体的网络模型自定义设置 `amp_level` 。

          在GPU上建议使用"O2"，在Ascend上建议使用"O3"。
-          通过 `kwargs` 设置 `keep_batchnorm_fp32` ，可修改BatchNorm的精度策略， `keep_batchnorm_fp32` 必须为bool类型；通过 `kwargs` 设置 `loss_scale_manager` 可修改损失缩放策略，`loss_scale_manager` 必须为 :class:`mindspore.LossScaleManager` 的子类，
+          通过 `kwargs` 设置 `keep_batchnorm_fp32` ，可修改BatchNorm的精度策略， `keep_batchnorm_fp32` 必须为bool类型；通过 `kwargs` 设置 `loss_scale_manager` 可修改损失缩放策略，`loss_scale_manager` 必须为 :class:`mindspore.amp.LossScaleManager` 的子类，
          关于 `amp_level` 详见 `mindpore.build_train_network` 。

        - **boost_level** (str) - `mindspore.boost` 的可选参数，为boost模式训练等级。支持["O0", "O1", "O2"]. 默认值："O0"。
--- a/docs/api/api_python/nn/mindspore.nn.ASGD.rst
+++ b/docs/api/api_python/nn/mindspore.nn.ASGD.rst
@ -17,7 +17,7 @@ mindspore.nn.ASGD
            \mu_{t} = \frac{1}{\max(1, t - t0)}
        \end{gather*}
    
-    :math:`\lambda` 代表衰减项， :math:`\mu` 和 :math:`\eta` 被跟踪以更新 :math:`ax` 和 :math:`w` ， :math:`t0` 代表开始平均的点， :math:`\α` 代表 :math:`\eta` 更新的系数， :math:`ax` 表示平均参数值， :math:`t` 表示当前步数（step），:math:`g` 表示 `gradients` ， :math:`w` 表示`params` 。
+    :math:`\lambda` 代表衰减项， :math:`\mu` 和 :math:`\eta` 被跟踪以更新 :math:`ax` 和 :math:`w` ， :math:`t0` 代表开始平均的点， :math:`\α` 代表 :math:`\eta` 更新的系数， :math:`ax` 表示平均参数值， :math:`t` 表示当前步数（step），:math:`g` 表示 `gradients` ， :math:`w` 表示 `params` 。

    .. note::
        如果参数未分组，则优化器中的 `weight_decay` 将应用于名称中没有"beta"或"gamma"的参数。用户可以对参数进行分组，以更改权重衰减策略。当参数分组时，每个组都可以设置 `weight_decay` ，如果没有，将应用优化器中的 `weight_decay` 。
@ -40,7 +40,7 @@ mindspore.nn.ASGD
        - **t0** (float) - 开始平均的点。默认值：1e6。
        - **weight_decay** (Union[float, int, Cell]) - 权重衰减（L2 penalty）。默认值：0.0。

-        .. include:: mindspore.nn.optim_arg_dynamic_wd.rst
+          .. include:: mindspore.nn.optim_arg_dynamic_wd.rst

    输入：
        - **gradients** (tuple[Tensor]) - `params` 的梯度，shape与 `params` 相同。
--- a/docs/api/api_python/nn/mindspore.nn.optim_arg_loss_scale.rst
+++ b/docs/api/api_python/nn/mindspore.nn.optim_arg_loss_scale.rst
@ -1 +1 @@
- **loss_scale** (float) - 梯度缩放系数，必须大于0。如果 `loss_scale` 是整数，它将被转换为浮点数。通常使用默认值，仅当训练时使用了 `FixedLossScaleManager`，且 `FixedLossScaleManager` 的 `drop_overflow_update` 属性配置为False时，此值需要与 `FixedLossScaleManager` 中的 `loss_scale` 相同。有关更多详细信息，请参阅 :class:`mindspore.FixedLossScaleManager`。默认值：1.0。
+- **loss_scale** (float) - 梯度缩放系数，必须大于0。如果 `loss_scale` 是整数，它将被转换为浮点数。通常使用默认值，仅当训练时使用了 `FixedLossScaleManager`，且 `FixedLossScaleManager` 的 `drop_overflow_update` 属性配置为False时，此值需要与 `FixedLossScaleManager` 中的 `loss_scale` 相同。有关更多详细信息，请参阅 :class:`mindspore.amp.FixedLossScaleManager`。默认值：1.0。
--- a/docs/api/api_python/ops/mindspore.ops.AlltoAll.rst
+++ b/docs/api/api_python/ops/mindspore.ops.AlltoAll.rst
@ -1,7 +1,7 @@
 mindspore.ops.AlltoAll
 ======================

-.. py:class:: mindspore.ops.AlltoAll(split_count, split_dim, concat_dim, group='hccl_world_group')
+.. py:class:: mindspore.ops.AlltoAll(split_count, split_dim, concat_dim, group=GlobalComm.WORLD_COMM_GROUP)

    AlltoAll是一个集合通信函数。

--- a/docs/api/api_python/ops/mindspore.ops.CropAndResize.rst
+++ b/docs/api/api_python/ops/mindspore.ops.CropAndResize.rst
@ -24,3 +24,4 @@ mindspore.ops.CropAndResize
    异常：
        - **TypeError** - 如果 `method` 不是str。
        - **TypeError** - 如果 `extrapolation_value` 不是float，且取值不是"bilinear"、"nearest"或"bilinear_v2"。
+        - **ValueError** - 如果 `method` 不是'bilinear'、 'nearest'或者'bilinear_v2'。
--- a/docs/api/api_python/ops/mindspore.ops.EqualCount.rst
+++ b/docs/api/api_python/ops/mindspore.ops.EqualCount.rst
@ -16,4 +16,4 @@ mindspore.ops.EqualCount

    异常：
        - **TypeError** - 如果 `x` 或 `y` 不是Tensor。
-        - **ValueError** - 如果 `x` 与`y` 的shape不相等。
+        - **ValueError** - 如果 `x` 与 `y` 的shape不相等。
--- a/docs/api/api_python/ops/mindspore.ops.SparseTensorDenseMatmul.rst
+++ b/docs/api/api_python/ops/mindspore.ops.SparseTensorDenseMatmul.rst
@ -12,7 +12,7 @@ mindspore.ops.SparseTensorDenseMatmul
    输入：
        - **indices** (Tensor) - 二维Tensor，表示元素在稀疏Tensor中的位置。支持int32、int64，每个元素值都应该是非负的。shape是 :math:`(n,2)` 。
        - **values** (Tensor) - 一维Tensor，表示 `indices` 位置上对应的值。支持float16、float32、float64、int32、int64、complex64、complex128。shape应该是 :math:`(n,)` 。
-        - **sparse_shape** (tuple(int)) - 指定稀疏Tensor的shape，由两个正整数组成，表示稀疏Tensor的shape为 :math:`(N, C)` 。
+        - **sparse_shape** (tuple(int) 或 Tensor) - 指定稀疏Tensor的shape，由两个正整数组成，表示稀疏Tensor的shape为 :math:`(N, C)` 。
        - **dense** (Tensor) - 二维Tensor，数据类型与 `values` 相同。

          如果 `adjoint_st` 为False， `adjoint_dt` 为False，则shape必须为 :math:`(C, M)` 。
--- a/docs/api/api_python/ops/mindspore.ops.SquaredDifference.rst
+++ b/docs/api/api_python/ops/mindspore.ops.SquaredDifference.rst
@ -5,7 +5,7 @@ mindspore.ops.SquaredDifference

    第一个输入Tensor元素中减去第二个输入Tensor，并返回其平方。

-    `x` 和` y` 的输入遵循隐式类型转换规则，使数据类型一致。输入必须是两个Tensor或一个Tensor和一个Scalar。当输入是两个Tensor时，它们的数据类型不能同时为bool类型，并且它们的shape可以广播。当输入是一个Tensor和一个Scalar时，Scalar只能是一个常量。
+    `x` 和 `y` 的输入遵循隐式类型转换规则，使数据类型一致。输入必须是两个Tensor或一个Tensor和一个Scalar。当输入是两个Tensor时，它们的数据类型不能同时为bool类型，并且它们的shape可以广播。当输入是一个Tensor和一个Scalar时，Scalar只能是一个常量。

    .. math::
        out_{i} = (x_{i} - y_{i}) * (x_{i} - y_{i}) = (x_{i} - y_{i})^2
--- a/docs/api/api_python/ops/mindspore.ops.func_assign.rst
+++ b/docs/api/api_python/ops/mindspore.ops.func_assign.rst
@ -5,7 +5,7 @@ mindspore.ops.assign

    为网络参数赋值。

-    `variable` 和`value` 遵循隐式类型转换规则，使数据类型一致。如果它们具有不同的数据类型，则低精度数据类型将转换为相对最高精度的数据类型。
+    `variable` 和 `value` 遵循隐式类型转换规则，使数据类型一致。如果它们具有不同的数据类型，则低精度数据类型将转换为相对最高精度的数据类型。

    参数：
        - **variable** (Parameter) - 网路参数。 :math:`(N,*)` ，其中 :math:`*` 表示任意数量的附加维度，其秩应小于8。
--- a/docs/api/api_python/ops/mindspore.ops.func_count_nonzero.rst
+++ b/docs/api/api_python/ops/mindspore.ops.func_count_nonzero.rst
@ -1,7 +1,7 @@
 mindspore.ops.count_nonzero
 ============================

-.. py:function:: mindspore.ops.count_nonzero(x, axis=(), keep_dims=False, dtype=mindspore.int32)
+.. py:function:: mindspore.ops.count_nonzero(x, axis=(), keep_dims=False, dtype=mstype.int32)

    计算输入Tensor指定轴上的非零元素的数量。

--- a/docs/api/api_python/ops/mindspore.ops.func_custom_info_register.rst
+++ b/docs/api/api_python/ops/mindspore.ops.func_custom_info_register.rst
@ -1,7 +1,7 @@
 mindspore.ops.custom_info_register
 ==================================

-.. py:class:: mindspore.ops.custom_info_register(*reg_info)
+.. py:function:: mindspore.ops.custom_info_register(*reg_info)

    装饰器，用于将注册信息绑定到： :class:`mindspore.ops.Custom` 的 `func` 参数。

--- a/docs/api/api_python/ops/mindspore.ops.func_multinomial.rst
+++ b/docs/api/api_python/ops/mindspore.ops.func_multinomial.rst
--- a/mindspore/python/mindspore/nn/optim/ada_grad.py
+++ b/mindspore/python/mindspore/nn/optim/ada_grad.py
@ -131,7 +131,7 @@ class Adagrad(Optimizer):
        loss_scale (float): Value for the loss scale. It must be greater than 0.0. In general, use the default value.
            Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.
        weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.

--- a/mindspore/python/mindspore/nn/optim/adadelta.py
+++ b/mindspore/python/mindspore/nn/optim/adadelta.py
@ -113,7 +113,7 @@ class Adadelta(Optimizer):
        loss_scale (float): Value for the loss scale. It must be greater than 0.0. In general, use the default value.
            Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.
        weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.

--- a/mindspore/python/mindspore/nn/optim/adafactor.py
+++ b/mindspore/python/mindspore/nn/optim/adafactor.py
@ -232,7 +232,7 @@ class AdaFactor(Optimizer):
        loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
            default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.

    Inputs:
--- a/mindspore/python/mindspore/nn/optim/adam.py
+++ b/mindspore/python/mindspore/nn/optim/adam.py
@ -441,7 +441,7 @@ class Adam(Optimizer):
        loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
            default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.

    Inputs:
@ -902,7 +902,7 @@ class AdamOffload(Optimizer):
        loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
            default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.

    Inputs:
--- a/mindspore/python/mindspore/nn/optim/adamax.py
+++ b/mindspore/python/mindspore/nn/optim/adamax.py
@ -136,7 +136,7 @@ class AdaMax(Optimizer):
        loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
            default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.

    Inputs:
--- a/mindspore/python/mindspore/nn/optim/ftrl.py
+++ b/mindspore/python/mindspore/nn/optim/ftrl.py
@ -191,7 +191,7 @@ class FTRL(Optimizer):
        loss_scale (float): Value for the loss scale. It must be greater than 0.0. In general, use the default value.
            Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.
        weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.

--- a/mindspore/python/mindspore/nn/optim/lazyadam.py
+++ b/mindspore/python/mindspore/nn/optim/lazyadam.py
@ -285,7 +285,7 @@ class LazyAdam(Optimizer):
        loss_scale (float): A floating point value for the loss scale. Should be equal to or greater than 1. In general,
            use the default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update`
            in `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.

    Inputs:
--- a/mindspore/python/mindspore/nn/optim/momentum.py
+++ b/mindspore/python/mindspore/nn/optim/momentum.py
@ -145,7 +145,7 @@ class Momentum(Optimizer):
        loss_scale (float): A floating point value for the loss scale. It must be greater than 0.0. In general, use the
            default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.
        use_nesterov (bool): Enable Nesterov momentum. Default: False.

--- a/mindspore/python/mindspore/nn/optim/optimizer.py
+++ b/mindspore/python/mindspore/nn/optim/optimizer.py
@ -125,7 +125,7 @@ class Optimizer(Cell):
            type of `loss_scale` input is int, it will be converted to float. In general, use the default value. Only
            when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.

    Raises:
--- a/mindspore/python/mindspore/nn/optim/proximal_ada_grad.py
+++ b/mindspore/python/mindspore/nn/optim/proximal_ada_grad.py
@ -131,7 +131,7 @@ class ProximalAdagrad(Optimizer):
        loss_scale (float): Value for the loss scale. It must be greater than 0.0. In general, use the default value.
            Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.
        weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.

--- a/mindspore/python/mindspore/nn/optim/rmsprop.py
+++ b/mindspore/python/mindspore/nn/optim/rmsprop.py
@ -145,7 +145,7 @@ class RMSProp(Optimizer):
        loss_scale (float): A floating point value for the loss scale. Should be greater than 0. In general, use the
            default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.
        weight_decay (Union[float, int, Cell]): Weight decay (L2 penalty). Default: 0.0.

--- a/mindspore/python/mindspore/nn/optim/sgd.py
+++ b/mindspore/python/mindspore/nn/optim/sgd.py
@ -110,7 +110,7 @@ class SGD(Optimizer):
        loss_scale (float): A floating point value for the loss scale, which must be larger than 0.0. In general, use
            the default value. Only when `FixedLossScaleManager` is used for training and the `drop_overflow_update` in
            `FixedLossScaleManager` is set to False, then this value needs to be the same as the `loss_scale` in
-            `FixedLossScaleManager`. Refer to class :class:`mindspore.FixedLossScaleManager` for more details.
+            `FixedLossScaleManager`. Refer to class :class:`mindspore.amp.FixedLossScaleManager` for more details.
            Default: 1.0.

    Inputs:
--- a/mindspore/python/mindspore/nn/transformer/transformer.py
+++ b/mindspore/python/mindspore/nn/transformer/transformer.py
@ -2518,7 +2518,7 @@ class TransformerDecoder(Cell):
                'relu6', 'tanh', 'gelu', 'fast_gelu', 'elu', 'sigmoid', 'prelu', 'leakyrelu', 'hswish',
                'hsigmoid', 'logsigmoid' and so on. Default: gelu.
            lambda_func(function): A function can determine the fusion index,
-            pipeline stages and recompute attribute. If the
+                pipeline stages and recompute attribute. If the
                user wants to determine the pipeline stage and gradient aggregation fusion, the user can pass a
                function that accepts `network`, `layer_id`, `offset`, `parallel_config`, `layers`. The `network(Cell)`
                represents the transformer block, `layer_id(int)` means the layer index for the current module, counts
--- a/mindspore/python/mindspore/ops/composite/random_ops.py
+++ b/mindspore/python/mindspore/ops/composite/random_ops.py
@ -349,7 +349,7 @@ def multinomial(inputs, num_sample, replacement=True, seed=None):
        seed (int, optional): Seed is used as entropy source for the random number engines to generate
          pseudo-random numbers, must be non-negative. Default: None.

-    Outputs:
+    Returns:
        Tensor, has the same rows with input. The number of sampled indices of each row is `num_samples`.
        The dtype is float32.

--- a/mindspore/python/mindspore/ops/operations/array_ops.py
+++ b/mindspore/python/mindspore/ops/operations/array_ops.py
@ -6895,7 +6895,7 @@ class ExtractVolumePatches(Primitive):
    Supported Platforms:
        ``Ascend`` ``CPU``

-    Example:
+    Examples:
        >>> kernel_size = (1, 1, 2, 2, 2)
        >>> strides = (1, 1, 1, 1, 1)
        >>> padding = "VALID"
--- a/mindspore/python/mindspore/ops/operations/comm_ops.py
+++ b/mindspore/python/mindspore/ops/operations/comm_ops.py
@ -686,7 +686,7 @@ class NeighborExchange(Primitive):
    Supported Platforms:
        ``Ascend``

-    Example:
+    Examples:
        >>> # This example should be run with 2 devices. Refer to the tutorial > Distributed Training on mindspore.cn
        >>> import os
        >>> import mindspore as ms
@ -762,7 +762,7 @@ class AlltoAll(PrimitiveWithInfer):
    Supported Platforms:
        ``Ascend``

-    Example:
+    Examples:
        >>> # This example should be run with 8 devices. Refer to the tutorial > Distributed Training on mindspore.cn
        >>> import os
        >>> import mindspore as ms
--- a/mindspore/python/mindspore/ops/operations/math_ops.py
+++ b/mindspore/python/mindspore/ops/operations/math_ops.py
@ -2055,7 +2055,7 @@ class Rsqrt(Primitive):

    Inputs:
        - **x** (Tensor) - The input of Rsqrt. Its rank must be in [0, 7] inclusive and
-            each element must be a non-negative number.
+          each element must be a non-negative number.

    Outputs:
        Tensor, has the same type and shape as `x`.
@ -2092,7 +2092,6 @@ class Sqrt(Primitive):

        out_{i} =  \sqrt{x_{i}}

-
    Inputs:
        - **x** (Tensor) - The input tensor with a dtype of Number, its rank must be in [0, 7] inclusive.

--- a/mindspore/python/mindspore/ops/operations/sparse_ops.py
+++ b/mindspore/python/mindspore/ops/operations/sparse_ops.py
@ -623,7 +623,7 @@ class SparseTensorDenseMatmul(Primitive):
          Support int32, int64, each element value should be a non-negative int number. The shape is :math:`(n, 2)`.
        - **values** (Tensor) - A 1-D Tensor, represents the value corresponding to the position in the `indices`.
          Support float16, float32, float64, int32, int64, complex64, complex128. The shape should be :math:`(n,)`.
-        - **sparse_shape** (tuple(int)) or (Tensor) - A positive int tuple or tensor which specifies the shape of
+        - **sparse_shape** (tuple(int) or (Tensor)) - A positive int tuple or tensor which specifies the shape of
          sparse tensor, and only constant value is allowed when sparse_shape is a tensor, should have 2 elements,
          represent sparse tensor shape is :math:`(N, C)`.
        - **dense** (Tensor) - A 2-D Tensor, the dtype is same as `values`.
--- a/mindspore/python/mindspore/train/amp.py
+++ b/mindspore/python/mindspore/train/amp.py
@ -281,8 +281,8 @@ def build_train_network(network, optimizer, loss_fn=None, level='O0', boost_leve
        keep_batchnorm_fp32 (bool): Keep Batchnorm run in `float32` when the network is set to cast to `float16` . If
            set, the `level` setting will take no effect on this property.
        loss_scale_manager (Union[None, LossScaleManager]): If not None, must be subclass of
-            :class:`mindspore.LossScaleManager` for scaling the loss. If set, the `level` setting will take no effect
-            on this property.
+            :class:`mindspore.amp.LossScaleManager` for scaling the loss. If set, the `level` setting will
+            take no effect on this property.

    Raises:
        ValueError: If device is CPU, property `loss_scale_manager` is not `None` or `FixedLossScaleManager`
--- a/mindspore/python/mindspore/train/loss_scale_manager.py
+++ b/mindspore/python/mindspore/train/loss_scale_manager.py
@ -29,7 +29,7 @@ class LossScaleManager:
    `get_update_cell` is used to get the instance of :class:`mindspore.nn.Cell` that is used to update the loss scale,
    the instance will be called during the training. Currently, the `get_update_cell` is mostly used.

-    For example, :class:`mindspore.FixedLossScaleManager` and :class:`mindspore.DynamicLossScaleManager`.
+    For example, :class:`mindspore.amp.FixedLossScaleManager` and :class:`mindspore.amp.DynamicLossScaleManager`.
    """
    def get_loss_scale(self):
        """Get the value of loss scale, which is the amplification factor of the gradients."""
--- a/mindspore/python/mindspore/train/model.py
+++ b/mindspore/python/mindspore/train/model.py
@ -140,7 +140,7 @@ class Model:
            "O2" is recommended on GPU, "O3" is recommended on Ascend.
            The BatchNorm strategy can be changed by `keep_batchnorm_fp32` settings in `kwargs`. `keep_batchnorm_fp32`
            must be a bool. The loss scale strategy can be changed by `loss_scale_manager` setting in `kwargs`.
-            `loss_scale_manager` should be a subclass of :class:`mindspore.LossScaleManager`.
+            `loss_scale_manager` should be a subclass of :class:`mindspore.amp.LossScaleManager`.
            The more detailed explanation of `amp_level` setting can be found at `mindspore.build_train_network`.

        boost_level (str): Option for argument `level` in `mindspore.boost`, level for boost mode