forked from mindspore-Ecosystem/mindspore
modify the files
This commit is contained in:
parent
abf2225625
commit
1afdc74404
|
@ -8,7 +8,7 @@ mindspore.SparseTensor
|
|||
`SparseTensor` 只能在 `Cell` 的构造方法中使用。
|
||||
|
||||
.. note::
|
||||
此接口从 1.7 版本开始弃用,并计划在将来移除,请使用 `COOTensor`.
|
||||
此接口从 1.7 版本开始弃用,并计划在将来移除,请使用 `COOTensor`。
|
||||
|
||||
对于稠密张量,其 `SparseTensor(indices, values, shape)` 具有 `dense[indices[i]] = values[i]` 。
|
||||
|
||||
|
|
|
@ -23,7 +23,7 @@ mindspore.nn.BCELoss
|
|||
\end{cases}
|
||||
|
||||
.. note::
|
||||
预测值一般是sigmoid函数的输出,因为是二分类,所以目标值应是0或者1。如果输入是0或1,则上述损失函数是无意义的。
|
||||
预测值一般是sigmoid函数的输出。因为是二分类,所以目标值应是0或者1。如果输入是0或1,则上述损失函数是无意义的。
|
||||
|
||||
参数:
|
||||
- **weight** (Tensor, 可选) - 指定每个批次二值交叉熵的权重。与输入数据的shape和数据类型相同。默认值:None。
|
||||
|
|
|
@ -391,7 +391,7 @@
|
|||
运行construct方法。
|
||||
|
||||
.. note::
|
||||
该函数已经弃用,将会在未来版本中删除,不推荐使用此函数。
|
||||
该函数已经弃用,将会在未来版本中删除。不推荐使用此函数。
|
||||
|
||||
参数:
|
||||
- **cast_inputs** (tuple) - Cell的输入。
|
||||
|
|
|
@ -8,7 +8,7 @@ mindspore.nn.OneHot
|
|||
输入的 `indices` 表示的位置取值为on_value,其他所有位置取值为off_value。
|
||||
|
||||
.. note::
|
||||
如果indices是n阶Tensor,那么返回的one-hot Tensor则为n+1阶Tensor。
|
||||
如果indices是n阶Tensor,那么返回的one-hot Tensor则为n+1阶Tensor,新增 `axis` 维度。
|
||||
|
||||
如果 `indices` 是Scalar,则输出shape将是长度为 `depth` 的向量。
|
||||
|
||||
|
|
|
@ -14,7 +14,7 @@ mindspore.nn.SmoothL1Loss
|
|||
|x_i - y_i| - 0.5 {\beta}, & \text{otherwise.}
|
||||
\end{cases}
|
||||
|
||||
当 `reduction` 不是设定为 `none` 时,计算如下:
|
||||
当 `reduction` 不是设定为 `none` 时,计算如下:
|
||||
|
||||
.. math::
|
||||
L =
|
||||
|
|
|
@ -1,2 +1,2 @@
|
|||
如果前向网络使用了SparseGatherV2等算子,优化器会执行稀疏运算,通过设置 `target` 为CPU,可在主机(host)上进行稀疏运算。
|
||||
如果前向网络使用了SparseGatherV2等算子,优化器会执行稀疏运算。通过设置 `target` 为CPU,可在主机(host)上进行稀疏运算。
|
||||
稀疏特性在持续开发中。
|
||||
|
|
|
@ -2,4 +2,4 @@
|
|||
|
||||
参数分组情况下,可以分组调整权重衰减策略。
|
||||
|
||||
分组时,每组网络参数均可配置 `weight_decay` ,若未配置,则该组网络参数使用优化器中配置的 `weight_decay` 。
|
||||
分组时,每组网络参数均可配置 `weight_decay` 。若未配置,则该组网络参数使用优化器中配置的 `weight_decay` 。
|
||||
|
|
|
@ -9,7 +9,7 @@
|
|||
参数:
|
||||
- **data_parallel** (int) - 表示数据并行数。默认值:1。
|
||||
- **model_parallel** (int) - 表示模型并行数。默认值:1。
|
||||
- **expert_parallel** (int) - 表示专家并行数,只有在应用混合专家结构(MoE,Mixture of Experts)时才会生效。默认值:1.
|
||||
- **expert_parallel** (int) - 表示专家并行数,只有在应用混合专家结构(MoE,Mixture of Experts)时才会生效。默认值:1。
|
||||
- **pipeline_stage** (int) - 表示将Transformer切分成的stage数目。其值应为正数。默认值:1。
|
||||
- **micro_batch_num** (int) - 表示用于pipeline训练的batch的微型大小。默认值:1。
|
||||
- **optimizer_shard** (bool) - 表示是否使能优化器切分。默认值:False。
|
||||
|
|
|
@ -44,7 +44,7 @@ class L1Regularizer(Cell):
|
|||
r"""
|
||||
Applies l1 regularization to weights.
|
||||
|
||||
l1 regularization makes weights sparsity
|
||||
l1 regularization makes weights sparsity.
|
||||
|
||||
.. math::
|
||||
\text{loss}=\lambda * \text{reduce_sum}(\text{abs}(\omega))
|
||||
|
@ -52,7 +52,7 @@ class L1Regularizer(Cell):
|
|||
where :math:`\lambda` is `scale` .
|
||||
|
||||
Note:
|
||||
scale(regularization factor) should be a number which greater than 0
|
||||
scale(regularization factor) should be a number which greater than 0.
|
||||
|
||||
Args:
|
||||
scale (int, float): l1 regularization factor which greater than 0.
|
||||
|
|
|
@ -512,7 +512,7 @@ class SmoothL1Loss(LossBase):
|
|||
TypeError: If dtype of `logits` is not the same as `labels`.
|
||||
ValueError: If `beta` is less than or equal to 0.
|
||||
ValueError: If shape of `logits` is not the same as `labels`.
|
||||
ValueError: The float64 data type of `logits` is support on Ascend platform.
|
||||
TypeError: The float64 data type of `logits` is support on Ascend platform.
|
||||
|
||||
Supported Platforms:
|
||||
``Ascend`` ``GPU`` ``CPU``
|
||||
|
|
|
@ -639,7 +639,7 @@ class AdamWeightDecay(Optimizer):
|
|||
|
||||
If parameters are not grouped, the `weight_decay` in optimizer will be applied on the network parameters without
|
||||
'beta' or 'gamma' in their names. Users can group parameters to change the strategy of decaying weight. When
|
||||
parameters are grouped, each group can set `weight_decay`, if not, the `weight_decay` in optimizer will be
|
||||
parameters are grouped, each group can set `weight_decay`. If not, the `weight_decay` in optimizer will be
|
||||
applied.
|
||||
|
||||
Args:
|
||||
|
|
|
@ -420,7 +420,7 @@ class AdaSumByGradWrapCell(Cell):
|
|||
Note:
|
||||
When using AdaSum, the number of traning cards needs to be a power of 2 and at least 16 cards are required.
|
||||
Currently, the optimizer sharding and pipeline parallel is not supported when using AdaSum.
|
||||
It is recommended to using AdaSumByGradWrapCell in semi auto parallel/auto parallel mode, and in data parallel
|
||||
It is recommended to using AdaSumByGradWrapCell in semi auto parallel/auto parallel mode. In data parallel
|
||||
mode, we recommend to using mindspore.boost to applying AdaSum.
|
||||
|
||||
Args:
|
||||
|
@ -487,8 +487,8 @@ class AdaSumByDeltaWeightWrapCell(Cell):
|
|||
Note:
|
||||
When using AdaSum, the number of traning cards needs to be a power of 2 and at least 16 cards are required.
|
||||
Currently, the optimizer sharding and pipeline parallel is not supported when using AdaSum.
|
||||
It is recommended to using AdaSumByDeltaWeightWrapCell in semi auto parallel/auto parallel mode,
|
||||
and in data parallel mode, we recommend to using mindspore.boost to applying AdaSum.
|
||||
It is recommended to using AdaSumByDeltaWeightWrapCell in semi auto parallel/auto parallel mode.
|
||||
In data parallel mode, we recommend to using mindspore.boost to applying AdaSum.
|
||||
|
||||
Args:
|
||||
optimizer (Union[Cell]): Optimizer for updating the weights. The construct function of the optimizer
|
||||
|
|
|
@ -132,7 +132,7 @@ class Lamb(Optimizer):
|
|||
Note:
|
||||
There is usually no connection between a optimizer and mixed precision. But when `FixedLossScaleManager` is used
|
||||
and `drop_overflow_update` in `FixedLossScaleManager` is set to False, optimizer needs to set the 'loss_scale'.
|
||||
As this optimizer has no argument of `loss_scale`, so `loss_scale` needs to be processed by other means, refer
|
||||
As this optimizer has no argument of `loss_scale`, so `loss_scale` needs to be processed by other means. Refer
|
||||
document `LossScale <https://www.mindspore.cn/tutorials/experts/zh-CN/master/others/mixed_precision.html>`_ to
|
||||
process `loss_scale` correctly.
|
||||
|
||||
|
|
Loading…
Reference in New Issue