modify the files

This commit is contained in:
zhangyi 2022-08-15 17:41:47 +08:00
parent abf2225625
commit 1afdc74404
13 changed files with 16 additions and 16 deletions

View File

@ -8,7 +8,7 @@ mindspore.SparseTensor
`SparseTensor` 只能在 `Cell` 的构造方法中使用。
.. note::
此接口从 1.7 版本开始弃用,并计划在将来移除,请使用 `COOTensor`.
此接口从 1.7 版本开始弃用,并计划在将来移除,请使用 `COOTensor`
对于稠密张量,其 `SparseTensor(indices, values, shape)` 具有 `dense[indices[i]] = values[i]`

View File

@ -23,7 +23,7 @@ mindspore.nn.BCELoss
\end{cases}
.. note::
预测值一般是sigmoid函数的输出因为是二分类所以目标值应是0或者1。如果输入是0或1则上述损失函数是无意义的。
预测值一般是sigmoid函数的输出因为是二分类所以目标值应是0或者1。如果输入是0或1则上述损失函数是无意义的。
参数:
- **weight** (Tensor, 可选) - 指定每个批次二值交叉熵的权重。与输入数据的shape和数据类型相同。默认值None。

View File

@ -391,7 +391,7 @@
运行construct方法。
.. note::
该函数已经弃用,将会在未来版本中删除不推荐使用此函数。
该函数已经弃用,将会在未来版本中删除不推荐使用此函数。
参数:
- **cast_inputs** (tuple) - Cell的输入。

View File

@ -8,7 +8,7 @@ mindspore.nn.OneHot
输入的 `indices` 表示的位置取值为on_value其他所有位置取值为off_value。
.. note::
如果indices是n阶Tensor那么返回的one-hot Tensor则为n+1阶Tensor。
如果indices是n阶Tensor那么返回的one-hot Tensor则为n+1阶Tensor,新增 `axis` 维度
如果 `indices` 是Scalar则输出shape将是长度为 `depth` 的向量。

View File

@ -14,7 +14,7 @@ mindspore.nn.SmoothL1Loss
|x_i - y_i| - 0.5 {\beta}, & \text{otherwise.}
\end{cases}
`reduction` 不是设定为 `none` 时,计算如下:
`reduction` 不是设定为 `none` 时,计算如下
.. math::
L =

View File

@ -1,2 +1,2 @@
如果前向网络使用了SparseGatherV2等算子优化器会执行稀疏运算通过设置 `target` 为CPU可在主机host上进行稀疏运算。
如果前向网络使用了SparseGatherV2等算子优化器会执行稀疏运算通过设置 `target` 为CPU可在主机host上进行稀疏运算。
稀疏特性在持续开发中。

View File

@ -2,4 +2,4 @@
参数分组情况下,可以分组调整权重衰减策略。
分组时,每组网络参数均可配置 `weight_decay` 若未配置,则该组网络参数使用优化器中配置的 `weight_decay`
分组时,每组网络参数均可配置 `weight_decay` 若未配置,则该组网络参数使用优化器中配置的 `weight_decay`

View File

@ -9,7 +9,7 @@
参数:
- **data_parallel** (int) - 表示数据并行数。默认值1。
- **model_parallel** (int) - 表示模型并行数。默认值1。
- **expert_parallel** (int) - 表示专家并行数只有在应用混合专家结构MoEMixture of Experts时才会生效。默认值1.
- **expert_parallel** (int) - 表示专家并行数只有在应用混合专家结构MoEMixture of Experts时才会生效。默认值1
- **pipeline_stage** (int) - 表示将Transformer切分成的stage数目。其值应为正数。默认值1。
- **micro_batch_num** (int) - 表示用于pipeline训练的batch的微型大小。默认值1。
- **optimizer_shard** (bool) - 表示是否使能优化器切分。默认值False。

View File

@ -44,7 +44,7 @@ class L1Regularizer(Cell):
r"""
Applies l1 regularization to weights.
l1 regularization makes weights sparsity
l1 regularization makes weights sparsity.
.. math::
\text{loss}=\lambda * \text{reduce_sum}(\text{abs}(\omega))
@ -52,7 +52,7 @@ class L1Regularizer(Cell):
where :math:`\lambda` is `scale` .
Note:
scale(regularization factor) should be a number which greater than 0
scale(regularization factor) should be a number which greater than 0.
Args:
scale (int, float): l1 regularization factor which greater than 0.

View File

@ -512,7 +512,7 @@ class SmoothL1Loss(LossBase):
TypeError: If dtype of `logits` is not the same as `labels`.
ValueError: If `beta` is less than or equal to 0.
ValueError: If shape of `logits` is not the same as `labels`.
ValueError: The float64 data type of `logits` is support on Ascend platform.
TypeError: The float64 data type of `logits` is support on Ascend platform.
Supported Platforms:
``Ascend`` ``GPU`` ``CPU``

View File

@ -639,7 +639,7 @@ class AdamWeightDecay(Optimizer):
If parameters are not grouped, the `weight_decay` in optimizer will be applied on the network parameters without
'beta' or 'gamma' in their names. Users can group parameters to change the strategy of decaying weight. When
parameters are grouped, each group can set `weight_decay`, if not, the `weight_decay` in optimizer will be
parameters are grouped, each group can set `weight_decay`. If not, the `weight_decay` in optimizer will be
applied.
Args:

View File

@ -420,7 +420,7 @@ class AdaSumByGradWrapCell(Cell):
Note:
When using AdaSum, the number of traning cards needs to be a power of 2 and at least 16 cards are required.
Currently, the optimizer sharding and pipeline parallel is not supported when using AdaSum.
It is recommended to using AdaSumByGradWrapCell in semi auto parallel/auto parallel mode, and in data parallel
It is recommended to using AdaSumByGradWrapCell in semi auto parallel/auto parallel mode. In data parallel
mode, we recommend to using mindspore.boost to applying AdaSum.
Args:
@ -487,8 +487,8 @@ class AdaSumByDeltaWeightWrapCell(Cell):
Note:
When using AdaSum, the number of traning cards needs to be a power of 2 and at least 16 cards are required.
Currently, the optimizer sharding and pipeline parallel is not supported when using AdaSum.
It is recommended to using AdaSumByDeltaWeightWrapCell in semi auto parallel/auto parallel mode,
and in data parallel mode, we recommend to using mindspore.boost to applying AdaSum.
It is recommended to using AdaSumByDeltaWeightWrapCell in semi auto parallel/auto parallel mode.
In data parallel mode, we recommend to using mindspore.boost to applying AdaSum.
Args:
optimizer (Union[Cell]): Optimizer for updating the weights. The construct function of the optimizer

View File

@ -132,7 +132,7 @@ class Lamb(Optimizer):
Note:
There is usually no connection between a optimizer and mixed precision. But when `FixedLossScaleManager` is used
and `drop_overflow_update` in `FixedLossScaleManager` is set to False, optimizer needs to set the 'loss_scale'.
As this optimizer has no argument of `loss_scale`, so `loss_scale` needs to be processed by other means, refer
As this optimizer has no argument of `loss_scale`, so `loss_scale` needs to be processed by other means. Refer
document `LossScale <https://www.mindspore.cn/tutorials/experts/zh-CN/master/others/mixed_precision.html>`_ to
process `loss_scale` correctly.