modify the files

2022-08-15 17:41:47 +08:00 · 2022-08-15 17:41:47 +08:00 · 1afdc74404
parent abf2225625
commit 1afdc74404
13 changed files with 16 additions and 16 deletions
--- a/docs/api/api_python/mindspore/mindspore.SparseTensor.rst
+++ b/docs/api/api_python/mindspore/mindspore.SparseTensor.rst
@ -8,7 +8,7 @@ mindspore.SparseTensor
    `SparseTensor` 只能在 `Cell` 的构造方法中使用。

    .. note::
-        此接口从 1.7 版本开始弃用，并计划在将来移除，请使用 `COOTensor`.
+        此接口从 1.7 版本开始弃用，并计划在将来移除，请使用 `COOTensor`。

    对于稠密张量，其 `SparseTensor(indices, values, shape)` 具有 `dense[indices[i]] = values[i]` 。

--- a/docs/api/api_python/nn/mindspore.nn.BCELoss.rst
+++ b/docs/api/api_python/nn/mindspore.nn.BCELoss.rst
@ -23,7 +23,7 @@ mindspore.nn.BCELoss
        \end{cases}

    .. note::
-        预测值一般是sigmoid函数的输出，因为是二分类，所以目标值应是0或者1。如果输入是0或1，则上述损失函数是无意义的。
+        预测值一般是sigmoid函数的输出。因为是二分类，所以目标值应是0或者1。如果输入是0或1，则上述损失函数是无意义的。

    参数：
        - **weight** (Tensor, 可选) - 指定每个批次二值交叉熵的权重。与输入数据的shape和数据类型相同。默认值：None。
--- a/docs/api/api_python/nn/mindspore.nn.Cell.rst
+++ b/docs/api/api_python/nn/mindspore.nn.Cell.rst
@ -391,7 +391,7 @@
        运行construct方法。

        .. note::
-            该函数已经弃用，将会在未来版本中删除，不推荐使用此函数。
+            该函数已经弃用，将会在未来版本中删除。不推荐使用此函数。

        参数：
            - **cast_inputs** (tuple) - Cell的输入。
--- a/docs/api/api_python/nn/mindspore.nn.OneHot.rst
+++ b/docs/api/api_python/nn/mindspore.nn.OneHot.rst
@ -8,7 +8,7 @@ mindspore.nn.OneHot
    输入的 `indices` 表示的位置取值为on_value，其他所有位置取值为off_value。

    .. note::
-        如果indices是n阶Tensor，那么返回的one-hot Tensor则为n+1阶Tensor。
+        如果indices是n阶Tensor，那么返回的one-hot Tensor则为n+1阶Tensor，新增 `axis` 维度。

    如果 `indices` 是Scalar，则输出shape将是长度为 `depth` 的向量。

--- a/docs/api/api_python/nn/mindspore.nn.SmoothL1Loss.rst
+++ b/docs/api/api_python/nn/mindspore.nn.SmoothL1Loss.rst
@ -14,7 +14,7 @@ mindspore.nn.SmoothL1Loss
        |x_i - y_i| - 0.5 {\beta}, & \text{otherwise.}
        \end{cases}

-    当 `reduction` 不是设定为 `none` 时，计算如下:
+    当 `reduction` 不是设定为 `none` 时，计算如下：

    .. math::
        L =
--- a/docs/api/api_python/nn/mindspore.nn.optim_note_sparse.rst
+++ b/docs/api/api_python/nn/mindspore.nn.optim_note_sparse.rst
@ -1,2 +1,2 @@
-如果前向网络使用了SparseGatherV2等算子，优化器会执行稀疏运算，通过设置 `target` 为CPU，可在主机（host）上进行稀疏运算。
+如果前向网络使用了SparseGatherV2等算子，优化器会执行稀疏运算。通过设置 `target` 为CPU，可在主机（host）上进行稀疏运算。
 稀疏特性在持续开发中。
--- a/docs/api/api_python/nn/mindspore.nn.optim_note_weight_decay.rst
+++ b/docs/api/api_python/nn/mindspore.nn.optim_note_weight_decay.rst
@ -2,4 +2,4 @@

 参数分组情况下，可以分组调整权重衰减策略。

-分组时，每组网络参数均可配置 `weight_decay` ，若未配置，则该组网络参数使用优化器中配置的 `weight_decay` 。
+分组时，每组网络参数均可配置 `weight_decay` 。若未配置，则该组网络参数使用优化器中配置的 `weight_decay` 。
--- a/docs/api/api_python/transformer/mindspore.nn.transformer.TransformerOpParallelConfig.rst
+++ b/docs/api/api_python/transformer/mindspore.nn.transformer.TransformerOpParallelConfig.rst
@ -9,7 +9,7 @@
    参数：
        - **data_parallel** (int) - 表示数据并行数。默认值：1。
        - **model_parallel** (int) - 表示模型并行数。默认值：1。
-        - **expert_parallel** (int) - 表示专家并行数，只有在应用混合专家结构（MoE，Mixture of Experts）时才会生效。默认值：1.
+        - **expert_parallel** (int) - 表示专家并行数，只有在应用混合专家结构（MoE，Mixture of Experts）时才会生效。默认值：1。
        - **pipeline_stage** (int) - 表示将Transformer切分成的stage数目。其值应为正数。默认值：1。
        - **micro_batch_num** (int) - 表示用于pipeline训练的batch的微型大小。默认值：1。
        - **optimizer_shard** (bool) - 表示是否使能优化器切分。默认值：False。
--- a/mindspore/python/mindspore/nn/layer/basic.py
+++ b/mindspore/python/mindspore/nn/layer/basic.py
@ -44,7 +44,7 @@ class L1Regularizer(Cell):
    r"""
    Applies l1 regularization to weights.

-    l1 regularization makes weights sparsity
+    l1 regularization makes weights sparsity.

    .. math::
        \text{loss}=\lambda * \text{reduce_sum}(\text{abs}(\omega))
@ -52,7 +52,7 @@ class L1Regularizer(Cell):
    where :math:`\lambda` is `scale` .

    Note:
-        scale(regularization factor) should be a number which greater than 0
+        scale(regularization factor) should be a number which greater than 0.

    Args:
        scale (int, float): l1 regularization factor which greater than 0.
--- a/mindspore/python/mindspore/nn/loss/loss.py
+++ b/mindspore/python/mindspore/nn/loss/loss.py
@ -512,7 +512,7 @@ class SmoothL1Loss(LossBase):
        TypeError: If dtype of `logits` is not the same as `labels`.
        ValueError: If `beta` is less than or equal to 0.
        ValueError: If shape of `logits` is not the same as `labels`.
-        ValueError: The float64 data type of `logits` is support on Ascend platform.
+        TypeError: The float64 data type of `logits` is support on Ascend platform.

    Supported Platforms:
        ``Ascend`` ``GPU`` ``CPU``
--- a/mindspore/python/mindspore/nn/optim/adam.py
+++ b/mindspore/python/mindspore/nn/optim/adam.py
@ -639,7 +639,7 @@ class AdamWeightDecay(Optimizer):

        If parameters are not grouped, the `weight_decay` in optimizer will be applied on the network parameters without
        'beta' or 'gamma' in their names. Users can group parameters to change the strategy of decaying weight. When
-        parameters are grouped, each group can set `weight_decay`, if not, the `weight_decay` in optimizer will be
+        parameters are grouped, each group can set `weight_decay`. If not, the `weight_decay` in optimizer will be
        applied.

    Args:
--- a/mindspore/python/mindspore/nn/optim/adasum.py
+++ b/mindspore/python/mindspore/nn/optim/adasum.py
@ -420,7 +420,7 @@ class AdaSumByGradWrapCell(Cell):
    Note:
        When using AdaSum, the number of traning cards needs to be a power of 2 and at least 16 cards are required.
        Currently, the optimizer sharding and pipeline parallel is not supported when using AdaSum.
-        It is recommended to using AdaSumByGradWrapCell in semi auto parallel/auto parallel mode, and in data parallel
+        It is recommended to using AdaSumByGradWrapCell in semi auto parallel/auto parallel mode. In data parallel
        mode, we recommend to using mindspore.boost to applying AdaSum.

    Args:
@ -487,8 +487,8 @@ class AdaSumByDeltaWeightWrapCell(Cell):
    Note:
        When using AdaSum, the number of traning cards needs to be a power of 2 and at least 16 cards are required.
        Currently, the optimizer sharding and pipeline parallel is not supported when using AdaSum.
-        It is recommended to using AdaSumByDeltaWeightWrapCell in semi auto parallel/auto parallel mode,
-        and in data parallel mode, we recommend to using mindspore.boost to applying AdaSum.
+        It is recommended to using AdaSumByDeltaWeightWrapCell in semi auto parallel/auto parallel mode.
+        In data parallel mode, we recommend to using mindspore.boost to applying AdaSum.

    Args:
        optimizer (Union[Cell]): Optimizer for updating the weights. The construct function of the optimizer
--- a/mindspore/python/mindspore/nn/optim/lamb.py
+++ b/mindspore/python/mindspore/nn/optim/lamb.py
@ -132,7 +132,7 @@ class Lamb(Optimizer):
    Note:
        There is usually no connection between a optimizer and mixed precision. But when `FixedLossScaleManager` is used
        and `drop_overflow_update` in `FixedLossScaleManager` is set to False, optimizer needs to set the 'loss_scale'.
-        As this optimizer has no argument of `loss_scale`, so `loss_scale` needs to be processed by other means, refer
+        As this optimizer has no argument of `loss_scale`, so `loss_scale` needs to be processed by other means. Refer
        document `LossScale <https://www.mindspore.cn/tutorials/experts/zh-CN/master/others/mixed_precision.html>`_ to
        process `loss_scale` correctly.