!28817 modify api comments

Merge pull request !28817 from wangnan39/code_docs_modify_api_comments
This commit is contained in:
i-robot 2022-01-12 02:02:42 +00:00 committed by Gitee
commit 35ca76afb9
No known key found for this signature in database
GPG Key ID: 173E9B9CA92EEF8F
37 changed files with 118 additions and 117 deletions

View File

@ -13,7 +13,7 @@ mindspore.DatasetHelper
**参数:**
- **dataset** (Dataset) - 训练数据集迭代器。数据集可以由数据集生成器API在 :class:`mindspore.dataset` 中生成,例如 :class:`mindspore.dataset.ImageFolderDataset`
- **dataset_sink_mode** (bool) - 如果值为True使用 :class:`mindspore.ops.GetNext` 在设备Device上通过数据通道中获取数据否则在主机直接遍历数据集获取数据。默认值True。
- **dataset_sink_mode** (bool) - 如果值为True使用 :class:`mindspore.ops.GetNext` 在设备Device上通过数据通道中获取数据否则在主机Host直接遍历数据集获取数据。默认值True。
- **sink_size** (int) - 控制每个下沉中的数据量。如果 `sink_size` 为-1则下沉每个epoch的完整数据集。如果 `sink_size` 大于0则下沉每个epoch的 `sink_size` 数据。默认值:-1。
- **epoch_num** (int) - 控制待发送的epoch数据量。默认值1。

View File

@ -3,7 +3,7 @@ mindspore.DynamicLossScaleManager
.. py:class:: mindspore.DynamicLossScaleManager(init_loss_scale=2**24, scale_factor=2, scale_window=2000)
动态调整梯度放大系数的管理器,继承自 :class:`mindspore.LossScaleManager`
动态调整损失缩放系数的管理器,继承自 :class:`mindspore.LossScaleManager`
**参数:**

View File

@ -3,7 +3,7 @@ mindspore.FixedLossScaleManager
.. py:class:: mindspore.FixedLossScaleManager(loss_scale=128.0, drop_overflow_update=True)
梯度放大系数不变的管理器,继承自 :class:`mindspore.LossScaleManager`
损失缩放系数不变的管理器,继承自 :class:`mindspore.LossScaleManager`
**参数:**

View File

@ -3,7 +3,7 @@ mindspore.LossScaleManager
.. py:class:: mindspore.LossScaleManager
混合精度梯度放大系数loss scale管理器的抽象类。
使用混合精度时用于管理损失缩放系数loss scale的抽象类。
派生类需要实现该类的所有方法。 `get_loss_scale` 用于获取当前的梯度放大系数。 `update_loss_scale` 用于更新梯度放大系数,该方法将在训练过程中被调用。 `get_update_cell` 用于获取更新梯度放大系数的 `Cell` 实例,该实例将在训练过程中被调用。当前多使用 `get_update_cell` 方式。

View File

@ -15,20 +15,20 @@
- **eval_indexes** (list) - 在定义 `eval_network` 的情况下使用。如果 `eval_indexes` 为默认值None`Model` 会将 `eval_network` 的所有输出传给 `metrics` 。如果配置 `eval_indexes` ,必须包含三个元素,分别为损失值、预测值和标签在 `eval_network` 输出中的位置,此时,损失值将传给损失评价函数,预测值和标签将传给其他评价函数。推荐使用评价函数的 `mindspore.nn.Metric.set_indexes` 代替 `eval_indexes` 。默认值None。
- **amp_level** (str) - `mindspore.build_train_network` 的可选参数 `level``level` 为混合精度等级,该参数支持["O0", "O2", "O3", "auto"]。默认值:"O0"。
- O0: 无变化。
- O2: 将网络精度转为float16batchnorm保持float32精度使用动态调整梯度放大系数loss scale的策略。
- O3: 将网络精度包括batchnorm转为float16不使用梯度调整策略。
- auto: 为不同处理器设置专家推荐的混合精度等级如在GPU上设为O2在Ascend上设为O3。该设置方式可能在部分场景下不适用建议用户根据具体的网络模型自定义设置 `amp_level`
- "O0": 不变化。
- "O2": 将网络精度转为float16BatchNorm保持float32精度使用动态调整损失缩放系数loss scale的策略。
- "O3": 将网络精度包括BatchNorm转为float16不使用损失缩放策略。
- auto: 为不同处理器设置专家推荐的混合精度等级如在GPU上设为"O2"在Ascend上设为"O3"。该设置方式可能在部分场景下不适用,建议用户根据具体的网络模型自定义设置 `amp_level`
在GPU上建议使用O2在Ascend上建议使用O3。
通过`kwargs`设置`keep_batchnorm_fp32`,可修改batchnorm策略,`keep_batchnorm_fp32`必须为bool类型通过`kwargs`设置`loss_scale_manager`可修改梯度放大策略,`loss_scale_manager`必须为:class:`mindspore.LossScaleManager`的子类,
在GPU上建议使用"O2"在Ascend上建议使用"O3"
通过`kwargs`设置`keep_batchnorm_fp32`,可修改BatchNorm的精度策略,`keep_batchnorm_fp32`必须为bool类型通过`kwargs`设置`loss_scale_manager`可修改损失缩放策略,`loss_scale_manager`必须为:class:`mindspore.LossScaleManager`的子类,
关于 `amp_level` 详见 `mindpore.build_train_network`
- **boost_level** (str) `mindspore.boost` 的可选参数, 为boost模式训练等级。支持[“O0”, “O1”, “O2”]. 默认值: “O0”.
- O0: 无变化。
- O1: 启用boost模式, 性能将提升约20%, 精度保持不变。
- O2: 启用boost模式, 性能将提升约30%, 精度下降约3%。
- "O0": 不变化。
- "O1": 启用boost模式, 性能将提升约20%, 准确率保持不变。
- "O2": 启用boost模式, 性能将提升约30%, 准确率下降小于3%。
如果你想设置boost模式, 可以将 `boost_config_dict` 设置为 `boost.py`
@ -105,7 +105,8 @@
使用PyNative模式或CPU处理器时模型评估流程将以非下沉模式执行。
.. note::
如果 `dataset_sink_mode` 配置为True数据将被送到处理器中。如果处理器是Ascend数据特征将被逐一传输每次数据传输的上限是256M。如果 `dataset_sink_mode` 配置为True数据集仅能在当前模型中使用。该接口会构建并执行计算图如果使用前先执行了 `Model.build` ,那么它会直接执行计算图而不构建。
如果 `dataset_sink_mode` 配置为True数据将被发送到处理器中。此时数据集与模型绑定数据集仅能在当前模型中使用。如果处理器是Ascend数据特征将被逐一传输每次数据传输的上限是256M。
该接口会构建并执行计算图,如果使用前先执行了 `Model.build` ,那么它会直接执行计算图而不构建。
**参数:**
@ -115,7 +116,7 @@
**返回:**
Dict键是用户定义的评价指标名称,值是以推理模式运行的评估结果。
Dictkey是用户定义的评价指标名称value是以推理模式运行的评估结果。
**样例:**
@ -145,7 +146,7 @@
**参数:**
- **predict_data** (Tensor) 单个或多个张量的预测数据
- **predict_data** (Tensor) 预测样本,数据可以是单个张量、张量列表或张量元组
**返回:**

View File

@ -7,27 +7,26 @@ mindspore.build_train_network
**参数:**
- **network** (Cell) MindSpore的网络结构。
- **optimizer** (Optimizer) 优化器,用于更新参数。
- **loss_fn** (Union[None, Cell]) 损失函数的定义如果为None,网络结构中应该包含损失函数。默认值None。
- **network** (Cell) 定义网络结构。
- **optimizer** (Optimizer) 定义优化器,用于更新权重参数。
- **loss_fn** (Union[None, Cell]) 定义损失函数。如果为None`network` 中应该包含损失函数。默认值None。
- **level** (str) 支持["O0", "O2", "O3", "auto"]。默认值:"O0"。
- **O0** - 不进行精度变化。
- **O2** - 使网络在float16精度下运行如果网络结构中含有 `batchnorm``loss_fn` 使它们在float32下运行
- **O3** - 使网络在float16精度下运行并且设置 `keep_batchnorm_fp32` 为Flase。
- **auto** - 根据不同后端设置不同的级别。在GPU上设置为O2Ascend上设置为O3。自动设置的选项为系统推荐在特殊场景下可能并不适用。用户可以根据网络实际情况去设置。GPU推荐O2Ascend推荐O3 `keep_batchnorm_fp32` `cast_model_type``loss_scale_manager` 属性由level自动决定,有可能被 `kwargs` 参数覆盖
- **"O0"** - 不变化。
- **"O2"** - 将网络精度转为float16`BatchNorm``loss_fn` 保持float32精度使用动态调整损失缩放系数loss scale的策略
- **"O3"** - 将网络精度转为float16不使用损失缩放策略并设置 `keep_batchnorm_fp32` 为False。
- **auto** - 为不同处理器设置专家推荐的混合精度等级如在GPU上设为"O2"在Ascend上设为"O3"。该设置方式可能在部分场景下不适用,建议用户根据具体的网络模型自定义设置 `amp_level` `keep_batchnorm_fp32` `cast_model_type``loss_scale_manager` 属性由level自动决定。
- **boost_level** (str) `mindspore.boost` 中参数 `level` 的选项设置boost的训练模式级别。支持["O0", "O1", "O2"]。默认值: "O0"。
- **O0** - 不进行精度变化。
- **O2** - 开启boost模式性能提升20%左右,精度与原始精度相同。
- **O3** - 开启boost模式性能提升30%左右准确率降低小于3%。如果设置了O1或O2模式boost相关库将自动生效。
- **"O0"** - 不变化。
- **"O1"** - 开启boost模式性能提升20%左右,准确率与原始准确率相同。
- **"O2"** - 开启boost模式性能提升30%左右准确率降低小于3%。如果设置了"O1""O2"模式boost相关库将自动生效。
- **cast_model_type** (mindspore.dtype) 支持float16float32。如果设置了该参数网络将被转化为设置的数据类型而不会根据设置的level进行转换。
- **keep_batchnorm_fp32** (bool) 当网络被设置为float16时将保持Batchnorm在float32中运行。设置level不会影响该属性。
- **loss_scale_manager** (Union[None, LossScaleManager]) 如果为None则不进行loss scale否则将根据 `LossScaleManager` 进行loss scale。如果设置了 `level` 将不会影响这个属性。
- **keep_batchnorm_fp32** (bool) 当网络被设置为float16时配置为True则BatchNorm将保持在float32下运行。设置level不会影响该属性。
- **loss_scale_manager** (Union[None, LossScaleManager]) 如果不为None必须是 :class:`mindspore.LossScaleManager` 的子类,用于缩放损失系数(loss scale)。设置level不会影响该属性。
**异常:**
- **ValueError** 仅在GPU和Ascend上支持自动混合精度。如果设备是 CPU则为 `ValueError`
- **ValueError** - 如果是CPU则属性 `loss_scale_manager` 只能设置为 `None``FixedLossScaleManager`
- **ValueError** - 在CPU上属性 `loss_scale_manager` 不是 `None``FixedLossScaleManager`(其属性 `drop_overflow_update=False` )。

View File

@ -3,7 +3,7 @@ mindspore.nn.Adagrad
.. py:class:: mindspore.nn.Adagrad(*args, **kwargs)
使用ApplyAdagrad算子实现Adagrad算法
Adagrad算法的实现
Adagrad用于在线学习和随机优化。
请参阅论文 `Efficient Learning using Forward-Backward Splitting <https://proceedings.neurips.cc/paper/2009/file/621bf66ddb7c962aa0d22ac97d69b793-Paper.pdf>`_
@ -12,7 +12,7 @@ mindspore.nn.Adagrad
.. math::
\begin{array}{ll} \\
h_{t+1} = h_{t} + g\\
h_{t+1} = h_{t} + g*g\\
w_{t+1} = w_{t} - lr*\frac{1}{\sqrt{h_{t+1}}}*g
\end{array}
@ -20,7 +20,7 @@ mindspore.nn.Adagrad
:math:`lr` 代表 `learning_rate`:math:`w` 代表 `params`
.. note::
在参数未分组时,优化器配置的 `weight_decay` 应用于名称含有"beta"或"gamma"的网络参数,通过网络参数分组可调整权重衰减策略。分组时,每组网络参数均可配置 `weight_decay` ,若未配置,则该组网络参数使用优化器中配置的 `weight_decay`
.. include:: mindspore.nn.optim_note_weight_decay.rst
**参数:**

View File

@ -3,7 +3,7 @@ mindspore.nn.Adam
.. py:class:: mindspore.nn.Adam(*args, **kwargs)
通过Adaptive Moment Estimation (Adam)算法更新梯度
Adaptive Moment Estimation (Adam)算法的实现
请参阅论文 `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_

View File

@ -3,7 +3,7 @@ mindspore.nn.AdamOffload
.. py:class:: mindspore.nn.AdamOffload(params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-8, use_locking=False, use_nesterov=False, weight_decay=0.0, loss_scale=1.0)
此优化器在主机CPU上运行Adam优化算法设备上仅执行网络参数的更新最大限度地降低内存成本。虽然会增加性能开销但优化器可运行更大的模型。
此优化器在主机CPU上运行Adam优化算法设备上仅执行网络参数的更新最大限度地降低内存成本。虽然会增加性能开销但优化器可被用于运行更大的模型。
Adam算法参见 `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_

View File

@ -3,7 +3,7 @@ mindspore.nn.AdamWeightDecay
.. py:class:: mindspore.nn.AdamWeightDecay(params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-6, weight_decay=0.0)
实现权重衰减Adam算法。
权重衰减Adam算法的实现
.. math::
\begin{array}{ll} \\

View File

@ -3,21 +3,21 @@ mindspore.nn.DynamicLossScaleUpdateCell
.. py:class:: mindspore.nn.DynamicLossScaleUpdateCell(loss_scale_value, scale_factor, scale_window)
用于动态地更新梯度放大系数(loss scale)的神经元。
用于动态更新损失缩放系数(loss scale)的神经元。
使用梯度放大功能进行训练时,初始梯度放大系数值为 `loss_scale_value`。在每个训练步骤中,当出现溢出时,通过计算公式 `loss_scale`/`scale_factor` 减小梯度放大系数。如果连续 `scale_window`step未溢出则将通过 `loss_scale` * `scale_factor` 增大梯度放大系数。
使用混合精度功能进行训练时,初始损失缩放系数值为 `loss_scale_value`。在每个训练步骤中,当出现溢出时,通过计算公式 `loss_scale`/`scale_factor` 减小损失缩放系数。如果连续 `scale_window`step未溢出则将通过 `loss_scale` * `scale_factor` 增大损失缩放系数。
该类是 :class:`mindspore.nn.DynamicLossScaleManager``get_update_cell` 方法的返回值。训练过程中,类 :class:`mindspore.TrainOneStepWithLossScaleCell` 会调用该Cell来更新梯度放大系数。
该类是 :class:`mindspore.nn.DynamicLossScaleManager``get_update_cell` 方法的返回值。训练过程中,类 :class:`mindspore.TrainOneStepWithLossScaleCell` 会调用该Cell来更新损失缩放系数。
**参数:**
- **loss_scale_value** (float) - 初始的梯度放大系数。
- **loss_scale_value** (float) - 初始的损失缩放系数。
- **scale_factor** (int) - 增减系数。
- **scale_window** (int) - 未溢出时,增大梯度放大系数的最大连续训练步数。
- **scale_window** (int) - 未溢出时,增大损失缩放系数的最大连续训练步数。
**输入:**
- **loss_scale** (Tensor) - 训练期间的梯度放大系数shape为 :math:`()`
- **loss_scale** (Tensor) - 训练期间的损失缩放系数,是一个标量
- **overflow** (bool) - 是否发生溢出。
**输出:**
@ -59,4 +59,4 @@ mindspore.nn.DynamicLossScaleUpdateCell
.. py:method:: get_loss_scale()
获取当前梯度放大系数。
获取当前损失缩放系数。

View File

@ -2,7 +2,7 @@ mindspore.nn.FTRL
=================
.. py:class:: mindspore.nn.FTRL(*args, **kwargs)
使用ApplyFtrl算子实现FTRL算法。
FTRL算法实现
FTRL是一种在线凸优化算法根据损失函数自适应地选择正则化函数。详见论文 `Adaptive Bound Optimization for Online Convex Optimization <https://arxiv.org/abs/1002.4908>`_。工程文档参阅 `Ad Click Prediction: a View from the Trenches <https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf>`_

View File

@ -3,17 +3,17 @@ mindspore.nn.FixedLossScaleUpdateCell
.. py:class:: mindspore.nn.FixedLossScaleUpdateCell(loss_scale_value)
固定梯度放大系数的神经元。
固定损失缩放系数的神经元。
该类是 :class:`mindspore.nn.FixedLossScaleManager``get_update_cell` 方法的返回值。训练过程中,类 :class:`mindspore.TrainOneStepWithLossScaleCell` 会调用该Cell。
**参数:**
- **loss_scale_value** (float) - 初始梯度放大系数。
- **loss_scale_value** (float) - 初始损失缩放系数。
**输入:**
- **loss_scale** (Tensor) - 训练期间的梯度放大系数shape为 :math:`()`在当前类中,该值被忽略。
- **loss_scale** (Tensor) - 训练期间的损失缩放系数,是一个标量。在当前类中,该值被忽略。
- **overflow** (bool) - 是否发生溢出。
**输出:**
@ -54,4 +54,4 @@ mindspore.nn.FixedLossScaleUpdateCell
.. py:method:: get_loss_scale()
获取当前梯度放大系数。
获取当前损失缩放系数。

View File

@ -3,7 +3,7 @@ mindspore.nn.LARS
.. py:class:: mindspore.nn.LARS(*args, **kwargs)
使用LARSUpdate算子实现LARS算法
LARS算法的实现
LARS算法采用大量的优化技术。详见论文 `LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS <https://arxiv.org/abs/1708.03888>`_

View File

@ -3,7 +3,7 @@ mindspore.nn.Lamb
.. py:class:: mindspore.nn.Lamb(*args, **kwargs)
LAMBLayer-wise Adaptive Moments optimizer for Batching training用于批训练的分层自适应矩优化器算法优化器
LAMBLayer-wise Adaptive Moments optimizer for Batching training用于批训练的分层自适应矩优化器算法的实现
LAMB是一种采用分层自适应批优化技术的优化算法。详见论文 `LARGE BATCH OPTIMIZATION FOR DEEP LEARNING: TRAINING BERT IN 76 MINUTES <https://arxiv.org/abs/1904.00962>`_

View File

@ -3,7 +3,7 @@ mindspore.nn.LazyAdam
.. py:class:: mindspore.nn.LazyAdam(*args, **kwargs)
通过Adaptive Moment Estimation (Adam)算法更新梯度。请参阅论文 `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_
Adaptive Moment Estimation (Adam)算法的实现。请参阅论文 `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_
当梯度稀疏时此优化器将使用Lazy Adam算法。

View File

@ -3,7 +3,7 @@ mindspore.nn.Momentum
.. py:class:: mindspore.nn.Momentum(*args, **kwargs)
Momentum算法优化器
Momentum算法的实现
有关更多详细信息,请参阅论文 `On the importance of initialization and momentum in deep learning <https://dl.acm.org/doi/10.5555/3042817.3043064>`_

View File

@ -3,7 +3,7 @@ mindspore.nn.ProximalAdagrad
.. py:class:: mindspore.nn.ProximalAdagrad(*args, **kwargs)
使用ApplyProximalAdagrad算子实现ProximalAdagrad算法
ProximalAdagrad算法的实现
ProximalAdagrad用于在线学习和随机优化。
请参阅论文 `Efficient Learning using Forward-Backward Splitting <http://papers.nips.cc//paper/3793-efficient-learning-using-forward-backward-splitting.pdf>`_

View File

@ -3,7 +3,7 @@ mindspore.nn.RMSProp
.. py:class:: mindspore.nn.RMSProp(*args, **kwargs)
实现均方根传播RMSProp算法。
均方根传播RMSProp算法的实现
根据RMSProp算法更新 `params`,算法详见 [http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf] 第29页。

View File

@ -3,7 +3,7 @@ mindspore.nn.SGD
.. py:class:: mindspore.nn.SGD(*args, **kwargs)
实现随机梯度下降。动量可选。
随机梯度下降的实现。动量可选。
SGD相关介绍参见 `SGD <https://en.wikipedia.org/wiki/Stochastic_gradient_dencent>`_

View File

@ -3,16 +3,16 @@ mindspore.nn.TrainOneStepWithLossScaleCell
.. py:class:: mindspore.nn.TrainOneStepWithLossScaleCell(network, optimizer, scale_sense)
使用梯度放大功能loss scale的训练网络。
使用混合精度功能的训练网络。
实现了包含梯度放大功能的单次训练。它使用网络、优化器和用于更新梯度放大系数的Cell(或一个Tensor)作为参数。可在host侧或device侧更新梯度放大系数。
如果需要在host侧更新使用Tensor作为 `scale_sense` ,否则,使用可更新梯度放大系数的Cell实例作为 `scale_sense`
实现了包含损失缩放loss scale的单次训练。它使用网络、优化器和用于更新损失缩放系数loss scale的Cell(或一个Tensor)作为参数。可在host侧或device侧更新损失缩放系数。
如果需要在host侧更新使用Tensor作为 `scale_sense` ,否则,使用可更新损失缩放系数的Cell实例作为 `scale_sense`
**参数:**
- **network** (Cell) - 训练网络。仅支持单输出网络。
- **optimizer** (Cell) - 用于更新网络参数的优化器。
- **scale_sense** (Union[Tensor, Cell]) - 如果此值为Cell类型`TrainOneStepWithLossScaleCell` 会调用它来更新梯度放大系数。如果此值为Tensor类型可调用 `set_sense_scale` 来更新梯度放大系数shape为 :math:`()`:math:`(1,)`
- **scale_sense** (Union[Tensor, Cell]) - 如果此值为Cell类型`TrainOneStepWithLossScaleCell` 会调用它来更新损失缩放系数。如果此值为Tensor类型可调用 `set_sense_scale` 来更新损失缩放系数shape为 :math:`()`:math:`(1,)`
**输入:**
@ -20,11 +20,11 @@ mindspore.nn.TrainOneStepWithLossScaleCell
**输出:**
Tuple包含三个Tensor分别为损失函数值、溢出状态和当前梯度放大系数。
Tuple包含三个Tensor分别为损失函数值、溢出状态和当前损失缩放系数。
- **loss** Tensor - shape为 :math:`()` 的Tensor
- **overflow** Tensor- shape为 :math:`()` 的Tensor类型为bool。
- **loss scale** Tensor- shape为 :math:`()` 的Tensor
- **loss** Tensor - 标量,表示损失函数值
- **overflow** Tensor- 类型为bool的标量,表示是否发生溢出
- **loss scale** Tensor- 表示损失放大系数shape为 :math:`()`:math:`(1,)`
**异常:**
@ -94,7 +94,7 @@ mindspore.nn.TrainOneStepWithLossScaleCell
.. py:method:: process_loss_scale(overflow)
根据溢出状态计算梯度放大系数。
根据溢出状态计算损失缩放系数。
继承该类自定义训练网络时,可复用该接口。
@ -113,7 +113,7 @@ mindspore.nn.TrainOneStepWithLossScaleCell
**参数:**
**sens** Tensor- 新的梯度放大系数其shape和类型需要与原始 `scale_sense` 相同。
**sens** Tensor- 新的损失缩放系数其shape和类型需要与原始 `scale_sense` 相同。
.. py:method:: start_overflow_check(pre_cond, compute_input)

View File

@ -9,7 +9,7 @@ mindspore.nn.WithLossCell
**参数:**
- **backbone** (Cell) - 要封装的目标网络。
- **backbone** (Cell) - 要封装的骨干网络。
- **loss_fn** (Cell) - 用于计算损失函数。
**输入:**

View File

@ -1,2 +1 @@
在参数未分组时,优化器配置的 `weight_decay` 应用于名称含有"beta"或"gamma"的网络参数,通过网络参数分组可调整权重衰减策略。分组时,每组网络参数均可配置 `weight_decay` ,若未配置,则该组网络参数使用优化器中配置的 `weight_decay`
在参数未分组时,优化器配置的 `weight_decay` 应用于名称不含"beta"或"gamma"的网络参数,参数分组情况下,可以分组调整权重衰减策略。分组时,每组网络参数均可配置 `weight_decay` ,若未配置,则该组网络参数使用优化器中配置的 `weight_decay`

View File

@ -6,4 +6,4 @@
.. py:method:: unique
:property:
该属性表示是否在优化器中进行梯度去重通常用于稀疏网络。如果梯度是稀疏的则设置为True。如果前向稀疏网络已对权重去重即梯度是稠密的则设置为False。未设置时默认值为True。
该属性表示是否在优化器中进行梯度去重通常用于稀疏网络。如果梯度是稀疏的则设置为True。如果前向稀疏网络已对权重去重即梯度是稠密的则设置为False。未进行任何配置时默认为True。

View File

@ -38,7 +38,7 @@ def _check_param_value(accum, update_slots, prim_name=None):
class Adagrad(Optimizer):
r"""
Implements the Adagrad algorithm with ApplyAdagrad Operator.
Implements the Adagrad algorithm.
Adagrad is an online Learning and Stochastic Optimization.
Refer to paper `Efficient Learning using Forward-Backward Splitting
@ -49,7 +49,7 @@ class Adagrad(Optimizer):
.. math::
\begin{array}{ll} \\
h_{t+1} = h_{t} + g\\
h_{t+1} = h_{t} + g*g\\
w_{t+1} = w_{t} - lr*\frac{1}{\sqrt{h_{t+1}}}*g
\end{array}

View File

@ -188,7 +188,7 @@ def _check_param_value(beta1, beta2, eps, prim_name):
class Adam(Optimizer):
r"""
Updates gradients by the Adaptive Moment Estimation (Adam) algorithm.
Implements the Adaptive Moment Estimation (Adam) algorithm.
The Adam optimizer can dynamically adjust the learning rate of each parameter using the first-order
moment estimation and the second-order moment estimation of the gradient.

View File

@ -75,7 +75,7 @@ def _check_param(initial_accum, lr_power, l1, l2, use_locking, prim_name=None):
class FTRL(Optimizer):
r"""
Implements the FTRL algorithm with ApplyFtrl Operator.
Implements the FTRL algorithm.
FTRL is an online convex optimization algorithm that adaptively chooses its regularization function
based on the loss functions. Refer to paper `Adaptive Bound Optimization for Online Convex Optimization

View File

@ -172,7 +172,7 @@ def _check_param_value(beta1, beta2, eps, prim_name):
class Lamb(Optimizer):
r"""
An optimizer that implements the Lamb(Layer-wise Adaptive Moments optimizer for Batching training) algorithm.
Implements the Lamb(Layer-wise Adaptive Moments optimizer for Batching training) algorithm.
LAMB is an optimization algorithm employing a layerwise adaptive large batch optimization technique.
Refer to the paper `LARGE BATCH OPTIMIZATION FOR DEEP LEARNING: TRAINING BERT IN 76

View File

@ -49,7 +49,7 @@ def _check_param_value(optimizer, epsilon, coefficient, use_clip, prim_name):
class LARS(Optimizer):
r"""
Implements the LARS algorithm with LARSUpdate Operator.
Implements the LARS algorithm.
LARS is an optimization algorithm employing a large batch optimization technique. Refer to paper `LARGE BATCH
TRAINING OF CONVOLUTIONAL NETWORKS <https://arxiv.org/abs/1708.03888>`_.

View File

@ -106,7 +106,7 @@ def _check_param_value(beta1, beta2, eps, weight_decay, prim_name):
class LazyAdam(Optimizer):
r"""
Updates gradients by the Adaptive Moment Estimation (Adam) algorithm. The Adam algorithm is proposed
Implements the Adaptive Moment Estimation (Adam) algorithm. The Adam algorithm is proposed
in `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_.
This optimizer will apply a lazy adam algorithm when gradient is sparse.

View File

@ -40,7 +40,7 @@ def _tensor_run_opt_ext(opt, momentum, learning_rate, gradient, weight, moment,
class Momentum(Optimizer):
r"""
An optimizer that implements the Momentum algorithm.
Implements the Momentum algorithm.
Refer to the paper `On the importance of initialization and momentum in deep
learning <https://dl.acm.org/doi/10.5555/3042817.3043064>`_ for more details.

View File

@ -54,7 +54,7 @@ def _check_param_value(accum, l1, l2, use_locking, prim_name=None):
class ProximalAdagrad(Optimizer):
r"""
Implements the ProximalAdagrad algorithm with ApplyProximalAdagrad Operator.
Implements the ProximalAdagrad algorithm.
ProximalAdagrad is an online Learning and Stochastic Optimization.
Refer to paper `Efficient Learning using Forward-Backward Splitting

View File

@ -73,7 +73,7 @@ class WithLossCell(Cell):
the computed loss will be returned.
Args:
backbone (Cell): The target network to wrap.
backbone (Cell): The backbone network to wrap.
loss_fn (Cell): The loss function used to compute loss.
Inputs:

View File

@ -245,9 +245,9 @@ class TrainOneStepWithLossScaleCell(TrainOneStepCell):
Outputs:
Tuple of 3 Tensor, the loss, overflow flag and current loss scale value.
- **loss** (Tensor) - Tensor with shape :math:`()`.
- **overflow** (Tensor) - Tensor with shape :math:`()`, type is bool.
- **loss scale** (Tensor) - Tensor with shape :math:`()`
- **loss** (Tensor) - A scalar, the loss value.
- **overflow** (Tensor) - A scalar, whether overflow occur or not, the type is bool.
- **loss scale** (Tensor) - The loss scale value, the shape is :math:`()` or :math:`(1,)`.
Raises:
TypeError: If `scale_sense` is neither Cell nor Tensor.

View File

@ -140,45 +140,46 @@ def build_train_network(network, optimizer, loss_fn=None, level='O0', boost_leve
Args:
network (Cell): Definition of the network.
loss_fn (Union[None, Cell]): Definition of the loss_fn. If None, the `network` should have the loss inside.
loss_fn (Union[None, Cell]): Define the loss function. If None, the `network` should have the loss inside.
Default: None.
optimizer (Optimizer): Optimizer to update the Parameter.
optimizer (Optimizer): Define the optimizer to update the Parameter.
level (str): Supports ["O0", "O2", "O3", "auto"]. Default: "O0".
- O0: Do not change.
- O2: Cast network to float16, keep batchnorm and `loss_fn` (if set) run in float32,
- "O0": Do not change.
- "O2": Cast network to float16, keep batchnorm and `loss_fn` (if set) run in float32,
using dynamic loss scale.
- O3: Cast network to float16, with additional property `keep_batchnorm_fp32=False` .
- auto: Set to level to recommended level in different devices. Set level to O2 on GPU, Set
level to O3 Ascend. The recommended level is chosen by the export experience, cannot
always general. User should specify the level for special network.
- "O3": Cast network to float16, with additional property `keep_batchnorm_fp32=False` .
- auto: Set to level to recommended level in different devices. Set level to "O2" on GPU, Set
level to "O3" Ascend. The recommended level is chosen by the export experience, not applicable to all
scenarios. User should specify the level for special network.
O2 is recommended on GPU, O3 is recommended on Ascend. Property of `keep_batchnorm_fp32`, `cast_model_type`
and `loss_scale_manager` determined by `level` setting may be overwritten by settings in `kwargs`.
"O2" is recommended on GPU, "O3" is recommended on Ascend. Property of `keep_batchnorm_fp32`,
`cast_model_type` and `loss_scale_manager` determined by `level` setting may be overwritten by settings in
`kwargs`.
boost_level (str): Option for argument `level` in `mindspore.boost` , level for boost mode
training. Supports ["O0", "O1", "O2"]. Default: "O0".
- O0: Do not change.
- O1: Enable the boost mode, the performance is improved by about 20%, and
- "O0": Do not change.
- "O1": Enable the boost mode, the performance is improved by about 20%, and
the accuracy is the same as the original accuracy.
- O2: Enable the boost mode, the performance is improved by about 30%, and
- "O2": Enable the boost mode, the performance is improved by about 30%, and
the accuracy is reduced by less than 3%.
If O1 or O2 mode is set, the boost related library will take effect automatically.
If "O1" or "O2" mode is set, the boost related library will take effect automatically.
cast_model_type (:class:`mindspore.dtype`): Supports `mstype.float16` or `mstype.float32` . If set, the
network will be casted to `cast_model_type` ( `mstype.float16` or `mstype.float32` ), but not to be casted
to the type determined by `level` setting.
keep_batchnorm_fp32 (bool): Keep Batchnorm run in `float32` when the network is set to cast to `float16` . If
set, the `level` setting will take no effect on this property.
loss_scale_manager (Union[None, LossScaleManager]): If None, not scale the loss, otherwise scale the loss by
`LossScaleManager` . If set, the `level` setting will take no effect on this property.
loss_scale_manager (Union[None, LossScaleManager]): If not None, must be subclass of
:class:`mindspore.LossScaleManager` for scaling the loss. If set, the `level` setting will take no effect
on this property.
Raises:
ValueError: Auto mixed precision only supported on device GPU and Ascend. If device is CPU, a `ValueError`
exception will be raised.
ValueError: If device is CPU, property `loss_scale_manager` only can be set as `None` or `FixedLossScaleManager`
(with property `drop_overflow_update=False` ), or a `ValueError` exception will be raised.
ValueError: If device is CPU, property `loss_scale_manager` is not `None` or `FixedLossScaleManager`
(with property `drop_overflow_update=False` ).
"""
validator.check_value_type('network', network, nn.Cell)
validator.check_value_type('optimizer', optimizer, (nn.Optimizer, boost.FreezeOpt))

View File

@ -20,7 +20,8 @@ from .. import nn
class LossScaleManager:
"""
Loss scale (Magnification factor of gradients when mix precision is used) manager abstract class.
Loss scale (Magnification factor of gradients when mix precision is used) manager abstract class when using
mixed precision.
Derived class needs to implement all of its methods. `get_loss_scale` is used to get current loss scale value.
`update_loss_scale` is used to update loss scale value, `update_loss_scale` will be called during the training.

View File

@ -114,15 +114,15 @@ class Model:
amp_level (str): Option for argument `level` in :func:`mindspore.build_train_network`, level for mixed
precision training. Supports ["O0", "O2", "O3", "auto"]. Default: "O0".
- O0: Do not change.
- O2: Cast network to float16, keep batchnorm run in float32, using dynamic loss scale.
- O3: Cast network to float16, the batchnorm is also cast to float16, loss scale will not be used.
- auto: Set level to recommended level in different devices. Set level to O2 on GPU, set
level to O3 on Ascend. The recommended level is chosen by the export experience, not applicable to all
- "O0": Do not change.
- "O2": Cast network to float16, keep BatchNorm run in float32, using dynamic loss scale.
- "O3": Cast network to float16, the BatchNorm is also cast to float16, loss scale will not be used.
- auto: Set level to recommended level in different devices. Set level to "O2" on GPU, set
level to "O3" on Ascend. The recommended level is chosen by the export experience, not applicable to all
scenarios. User should specify the level for special network.
O2 is recommended on GPU, O3 is recommended on Ascend.
The batchnorm strategy can be changed by `keep_batchnorm_fp32` settings in `kwargs`. `keep_batchnorm_fp32`
"O2" is recommended on GPU, "O3" is recommended on Ascend.
The BatchNorm strategy can be changed by `keep_batchnorm_fp32` settings in `kwargs`. `keep_batchnorm_fp32`
must be a bool. The loss scale strategy can be changed by `loss_scale_manager` setting in `kwargs`.
`loss_scale_manager` should be a subclass of :class:`mindspore.LossScaleManager`.
The more detailed explanation of `amp_level` setting can be found at `mindspore.build_train_network`.
@ -130,10 +130,10 @@ class Model:
boost_level (str): Option for argument `level` in `mindspore.boost`, level for boost mode
training. Supports ["O0", "O1", "O2"]. Default: "O0".
- O0: Do not change.
- O1: Enable the boost mode, the performance is improved by about 20%, and
- "O0": Do not change.
- "O1": Enable the boost mode, the performance is improved by about 20%, and
the accuracy is the same as the original accuracy.
- O2: Enable the boost mode, the performance is improved by about 30%, and
- "O2": Enable the boost mode, the performance is improved by about 30%, and
the accuracy is reduced by less than 3%.
If you want to config boost mode by yourself, you can set boost_config_dict as `boost.py`.
@ -906,11 +906,10 @@ class Model:
Configure to pynative mode or CPU, the evaluating process will be performed with dataset non-sink mode.
Note:
If dataset_sink_mode is True, data will be sent to device. If the device is Ascend, features
If dataset_sink_mode is True, data will be sent to device. At this point, the dataset will be bound to this
model, so the dataset cannot be used by other models. If the device is Ascend, features
of data will be transferred one by one. The limitation of data transmission per time is 256M.
If dataset_sink_mode is True, dataset will be bound to this model and cannot be used by other models.
The interface builds the computational graphs and then executes the computational graphs. However, when
the `Model.build` is executed first, it only performs the graphs execution.
@ -1107,7 +1106,8 @@ class Model:
Batch data should be put together in one tensor.
Args:
predict_data (Tensor): One tensor or multiple tensors of predict data.
predict_data (Optional[Tensor, list[Tensor], tuple[Tensor]]): The predict data, can be a single tensor,
a list of tensor, or a tuple of tensor.
Returns:
Dict, Parameter layout dictionary used for load distributed checkpoint.