forked from mindspore-Ecosystem/mindspore
!28817 modify api comments
Merge pull request !28817 from wangnan39/code_docs_modify_api_comments
This commit is contained in:
commit
35ca76afb9
|
@ -13,7 +13,7 @@ mindspore.DatasetHelper
|
|||
**参数:**
|
||||
|
||||
- **dataset** (Dataset) - 训练数据集迭代器。数据集可以由数据集生成器API在 :class:`mindspore.dataset` 中生成,例如 :class:`mindspore.dataset.ImageFolderDataset` 。
|
||||
- **dataset_sink_mode** (bool) - 如果值为True,使用 :class:`mindspore.ops.GetNext` 在设备(Device)上通过数据通道中获取数据,否则在主机直接遍历数据集获取数据。默认值:True。
|
||||
- **dataset_sink_mode** (bool) - 如果值为True,使用 :class:`mindspore.ops.GetNext` 在设备(Device)上通过数据通道中获取数据,否则在主机(Host)直接遍历数据集获取数据。默认值:True。
|
||||
- **sink_size** (int) - 控制每个下沉中的数据量。如果 `sink_size` 为-1,则下沉每个epoch的完整数据集。如果 `sink_size` 大于0,则下沉每个epoch的 `sink_size` 数据。默认值:-1。
|
||||
- **epoch_num** (int) - 控制待发送的epoch数据量。默认值:1。
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.DynamicLossScaleManager
|
|||
|
||||
.. py:class:: mindspore.DynamicLossScaleManager(init_loss_scale=2**24, scale_factor=2, scale_window=2000)
|
||||
|
||||
动态调整梯度放大系数的管理器,继承自 :class:`mindspore.LossScaleManager` 。
|
||||
动态调整损失缩放系数的管理器,继承自 :class:`mindspore.LossScaleManager` 。
|
||||
|
||||
**参数:**
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.FixedLossScaleManager
|
|||
|
||||
.. py:class:: mindspore.FixedLossScaleManager(loss_scale=128.0, drop_overflow_update=True)
|
||||
|
||||
梯度放大系数不变的管理器,继承自 :class:`mindspore.LossScaleManager` 。
|
||||
损失缩放系数不变的管理器,继承自 :class:`mindspore.LossScaleManager` 。
|
||||
|
||||
**参数:**
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.LossScaleManager
|
|||
|
||||
.. py:class:: mindspore.LossScaleManager
|
||||
|
||||
混合精度梯度放大系数(loss scale)管理器的抽象类。
|
||||
使用混合精度时,用于管理损失缩放系数(loss scale)的抽象类。
|
||||
|
||||
派生类需要实现该类的所有方法。 `get_loss_scale` 用于获取当前的梯度放大系数。 `update_loss_scale` 用于更新梯度放大系数,该方法将在训练过程中被调用。 `get_update_cell` 用于获取更新梯度放大系数的 `Cell` 实例,该实例将在训练过程中被调用。当前多使用 `get_update_cell` 方式。
|
||||
|
||||
|
|
|
@ -15,20 +15,20 @@
|
|||
- **eval_indexes** (list) - 在定义 `eval_network` 的情况下使用。如果 `eval_indexes` 为默认值None,`Model` 会将 `eval_network` 的所有输出传给 `metrics` 。如果配置 `eval_indexes` ,必须包含三个元素,分别为损失值、预测值和标签在 `eval_network` 输出中的位置,此时,损失值将传给损失评价函数,预测值和标签将传给其他评价函数。推荐使用评价函数的 `mindspore.nn.Metric.set_indexes` 代替 `eval_indexes` 。默认值:None。
|
||||
- **amp_level** (str) - `mindspore.build_train_network` 的可选参数 `level`,`level` 为混合精度等级,该参数支持["O0", "O2", "O3", "auto"]。默认值:"O0"。
|
||||
|
||||
- O0: 无变化。
|
||||
- O2: 将网络精度转为float16,batchnorm保持float32精度,使用动态调整梯度放大系数(loss scale)的策略。
|
||||
- O3: 将网络精度(包括batchnorm)转为float16,不使用梯度调整策略。
|
||||
- auto: 为不同处理器设置专家推荐的混合精度等级,如在GPU上设为O2,在Ascend上设为O3。该设置方式可能在部分场景下不适用,建议用户根据具体的网络模型自定义设置 `amp_level` 。
|
||||
- "O0": 不变化。
|
||||
- "O2": 将网络精度转为float16,BatchNorm保持float32精度,使用动态调整损失缩放系数(loss scale)的策略。
|
||||
- "O3": 将网络精度(包括BatchNorm)转为float16,不使用损失缩放策略。
|
||||
- auto: 为不同处理器设置专家推荐的混合精度等级,如在GPU上设为"O2",在Ascend上设为"O3"。该设置方式可能在部分场景下不适用,建议用户根据具体的网络模型自定义设置 `amp_level` 。
|
||||
|
||||
在GPU上建议使用O2,在Ascend上建议使用O3。
|
||||
通过`kwargs`设置`keep_batchnorm_fp32`,可修改batchnorm策略,`keep_batchnorm_fp32`必须为bool类型;通过`kwargs`设置`loss_scale_manager`可修改梯度放大策略,`loss_scale_manager`必须为:class:`mindspore.LossScaleManager`的子类,
|
||||
在GPU上建议使用"O2",在Ascend上建议使用"O3"。
|
||||
通过`kwargs`设置`keep_batchnorm_fp32`,可修改BatchNorm的精度策略,`keep_batchnorm_fp32`必须为bool类型;通过`kwargs`设置`loss_scale_manager`可修改损失缩放策略,`loss_scale_manager`必须为:class:`mindspore.LossScaleManager`的子类,
|
||||
关于 `amp_level` 详见 `mindpore.build_train_network`。
|
||||
|
||||
- **boost_level** (str) – `mindspore.boost` 的可选参数, 为boost模式训练等级。支持[“O0”, “O1”, “O2”]. 默认值: “O0”.
|
||||
|
||||
- O0: 无变化。
|
||||
- O1: 启用boost模式, 性能将提升约20%, 精度保持不变。
|
||||
- O2: 启用boost模式, 性能将提升约30%, 精度下降约3%。
|
||||
- "O0": 不变化。
|
||||
- "O1": 启用boost模式, 性能将提升约20%, 准确率保持不变。
|
||||
- "O2": 启用boost模式, 性能将提升约30%, 准确率下降小于3%。
|
||||
|
||||
如果你想设置boost模式, 可以将 `boost_config_dict` 设置为 `boost.py`。
|
||||
|
||||
|
@ -105,7 +105,8 @@
|
|||
使用PyNative模式或CPU处理器时,模型评估流程将以非下沉模式执行。
|
||||
|
||||
.. note::
|
||||
如果 `dataset_sink_mode` 配置为True,数据将被送到处理器中。如果处理器是Ascend,数据特征将被逐一传输,每次数据传输的上限是256M。如果 `dataset_sink_mode` 配置为True,数据集仅能在当前模型中使用。该接口会构建并执行计算图,如果使用前先执行了 `Model.build` ,那么它会直接执行计算图而不构建。
|
||||
如果 `dataset_sink_mode` 配置为True,数据将被发送到处理器中。此时数据集与模型绑定,数据集仅能在当前模型中使用。如果处理器是Ascend,数据特征将被逐一传输,每次数据传输的上限是256M。
|
||||
该接口会构建并执行计算图,如果使用前先执行了 `Model.build` ,那么它会直接执行计算图而不构建。
|
||||
|
||||
**参数:**
|
||||
|
||||
|
@ -115,7 +116,7 @@
|
|||
|
||||
**返回:**
|
||||
|
||||
Dict,键是用户定义的评价指标名称,值是以推理模式运行的评估结果。
|
||||
Dict,key是用户定义的评价指标名称,value是以推理模式运行的评估结果。
|
||||
|
||||
**样例:**
|
||||
|
||||
|
@ -145,7 +146,7 @@
|
|||
|
||||
**参数:**
|
||||
|
||||
- **predict_data** (Tensor) – 单个或多个张量的预测数据。
|
||||
- **predict_data** (Tensor) – 预测样本,数据可以是单个张量、张量列表或张量元组。
|
||||
|
||||
**返回:**
|
||||
|
||||
|
|
|
@ -7,27 +7,26 @@ mindspore.build_train_network
|
|||
|
||||
**参数:**
|
||||
|
||||
- **network** (Cell) – MindSpore的网络结构。
|
||||
- **optimizer** (Optimizer) – 优化器,用于更新参数。
|
||||
- **loss_fn** (Union[None, Cell]) – 损失函数的定义,如果为None,网络结构中应该包含损失函数。默认值:None。
|
||||
- **network** (Cell) – 定义网络结构。
|
||||
- **optimizer** (Optimizer) – 定义优化器,用于更新权重参数。
|
||||
- **loss_fn** (Union[None, Cell]) – 定义损失函数。如果为None,`network` 中应该包含损失函数。默认值:None。
|
||||
- **level** (str) – 支持["O0", "O2", "O3", "auto"]。默认值:"O0"。
|
||||
|
||||
- **O0** - 不进行精度变化。
|
||||
- **O2** - 使网络在float16精度下运行,如果网络结构中含有 `batchnorm` 和 `loss_fn` ,使它们在float32下运行。
|
||||
- **O3** - 使网络在float16精度下运行,并且设置 `keep_batchnorm_fp32` 为Flase。
|
||||
- **auto** - 根据不同后端设置不同的级别。在GPU上设置为O2,Ascend上设置为O3。自动设置的选项为系统推荐,在特殊场景下可能并不适用。用户可以根据网络实际情况去设置。GPU推荐O2,Ascend推荐O3, `keep_batchnorm_fp32` , `cast_model_type` 和 `loss_scale_manager` 属性由level自动决定,有可能被 `kwargs` 参数覆盖。
|
||||
- **"O0"** - 不变化。
|
||||
- **"O2"** - 将网络精度转为float16,`BatchNorm` 和 `loss_fn` 保持float32精度,使用动态调整损失缩放系数(loss scale)的策略。
|
||||
- **"O3"** - 将网络精度转为float16,不使用损失缩放策略,并设置 `keep_batchnorm_fp32` 为False。
|
||||
- **auto** - 为不同处理器设置专家推荐的混合精度等级,如在GPU上设为"O2",在Ascend上设为"O3"。该设置方式可能在部分场景下不适用,建议用户根据具体的网络模型自定义设置 `amp_level` 。 `keep_batchnorm_fp32` , `cast_model_type` 和 `loss_scale_manager` 属性由level自动决定。
|
||||
|
||||
- **boost_level** (str) – `mindspore.boost` 中参数 `level` 的选项,设置boost的训练模式级别。支持["O0", "O1", "O2"]。默认值: "O0"。
|
||||
|
||||
- **O0** - 不进行精度变化。
|
||||
- **O2** - 开启boost模式,性能提升20%左右,精度与原始精度相同。
|
||||
- **O3** - 开启boost模式,性能提升30%左右,准确率降低小于3%。如果设置了O1或O2模式,boost相关库将自动生效。
|
||||
- **"O0"** - 不变化。
|
||||
- **"O1"** - 开启boost模式,性能提升20%左右,准确率与原始准确率相同。
|
||||
- **"O2"** - 开启boost模式,性能提升30%左右,准确率降低小于3%。如果设置了"O1"或"O2"模式,boost相关库将自动生效。
|
||||
|
||||
- **cast_model_type** (mindspore.dtype) – 支持float16,float32。如果设置了该参数,网络将被转化为设置的数据类型,而不会根据设置的level进行转换。
|
||||
- **keep_batchnorm_fp32** (bool) – 当网络被设置为float16时,将保持Batchnorm在float32中运行。设置level不会影响该属性。
|
||||
- **loss_scale_manager** (Union[None, LossScaleManager]) – 如果为None,则不进行loss scale,否则将根据 `LossScaleManager` 进行loss scale。如果设置了, `level` 将不会影响这个属性。
|
||||
- **keep_batchnorm_fp32** (bool) – 当网络被设置为float16时,配置为True,则BatchNorm将保持在float32下运行。设置level不会影响该属性。
|
||||
- **loss_scale_manager** (Union[None, LossScaleManager]) – 如果不为None,必须是 :class:`mindspore.LossScaleManager` 的子类,用于缩放损失系数(loss scale)。设置level不会影响该属性。
|
||||
|
||||
**异常:**
|
||||
|
||||
- **ValueError** – 仅在GPU和Ascend上支持自动混合精度。如果设备是 CPU,则为 `ValueError`。
|
||||
- **ValueError** - 如果是CPU,则属性 `loss_scale_manager` 只能设置为 `None` 或 `FixedLossScaleManager`。
|
||||
- **ValueError** - 在CPU上,属性 `loss_scale_manager` 不是 `None` 或 `FixedLossScaleManager`(其属性 `drop_overflow_update=False` )。
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.Adagrad
|
|||
|
||||
.. py:class:: mindspore.nn.Adagrad(*args, **kwargs)
|
||||
|
||||
使用ApplyAdagrad算子实现Adagrad算法。
|
||||
Adagrad算法的实现。
|
||||
|
||||
Adagrad用于在线学习和随机优化。
|
||||
请参阅论文 `Efficient Learning using Forward-Backward Splitting <https://proceedings.neurips.cc/paper/2009/file/621bf66ddb7c962aa0d22ac97d69b793-Paper.pdf>`_。
|
||||
|
@ -12,7 +12,7 @@ mindspore.nn.Adagrad
|
|||
|
||||
.. math::
|
||||
\begin{array}{ll} \\
|
||||
h_{t+1} = h_{t} + g\\
|
||||
h_{t+1} = h_{t} + g*g\\
|
||||
w_{t+1} = w_{t} - lr*\frac{1}{\sqrt{h_{t+1}}}*g
|
||||
\end{array}
|
||||
|
||||
|
@ -20,7 +20,7 @@ mindspore.nn.Adagrad
|
|||
:math:`lr` 代表 `learning_rate`,:math:`w` 代表 `params` 。
|
||||
|
||||
.. note::
|
||||
在参数未分组时,优化器配置的 `weight_decay` 应用于名称含有"beta"或"gamma"的网络参数,通过网络参数分组可调整权重衰减策略。分组时,每组网络参数均可配置 `weight_decay` ,若未配置,则该组网络参数使用优化器中配置的 `weight_decay` 。
|
||||
.. include:: mindspore.nn.optim_note_weight_decay.rst
|
||||
|
||||
**参数:**
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.Adam
|
|||
|
||||
.. py:class:: mindspore.nn.Adam(*args, **kwargs)
|
||||
|
||||
通过Adaptive Moment Estimation (Adam)算法更新梯度。
|
||||
Adaptive Moment Estimation (Adam)算法的实现。
|
||||
|
||||
请参阅论文 `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_。
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.AdamOffload
|
|||
|
||||
.. py:class:: mindspore.nn.AdamOffload(params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-8, use_locking=False, use_nesterov=False, weight_decay=0.0, loss_scale=1.0)
|
||||
|
||||
此优化器在主机CPU上运行Adam优化算法,设备上仅执行网络参数的更新,最大限度地降低内存成本。虽然会增加性能开销,但优化器可以运行更大的模型。
|
||||
此优化器在主机CPU上运行Adam优化算法,设备上仅执行网络参数的更新,最大限度地降低内存成本。虽然会增加性能开销,但优化器可被用于运行更大的模型。
|
||||
|
||||
Adam算法参见 `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_。
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.AdamWeightDecay
|
|||
|
||||
.. py:class:: mindspore.nn.AdamWeightDecay(params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-6, weight_decay=0.0)
|
||||
|
||||
实现权重衰减Adam算法。
|
||||
权重衰减Adam算法的实现。
|
||||
|
||||
.. math::
|
||||
\begin{array}{ll} \\
|
||||
|
|
|
@ -3,21 +3,21 @@ mindspore.nn.DynamicLossScaleUpdateCell
|
|||
|
||||
.. py:class:: mindspore.nn.DynamicLossScaleUpdateCell(loss_scale_value, scale_factor, scale_window)
|
||||
|
||||
用于动态地更新梯度放大系数(loss scale)的神经元。
|
||||
用于动态更新损失缩放系数(loss scale)的神经元。
|
||||
|
||||
使用梯度放大功能进行训练时,初始梯度放大系数值为 `loss_scale_value`。在每个训练步骤中,当出现溢出时,通过计算公式 `loss_scale`/`scale_factor` 减小梯度放大系数。如果连续 `scale_window` 步(step)未溢出,则将通过 `loss_scale` * `scale_factor` 增大梯度放大系数。
|
||||
使用混合精度功能进行训练时,初始损失缩放系数值为 `loss_scale_value`。在每个训练步骤中,当出现溢出时,通过计算公式 `loss_scale`/`scale_factor` 减小损失缩放系数。如果连续 `scale_window` 步(step)未溢出,则将通过 `loss_scale` * `scale_factor` 增大损失缩放系数。
|
||||
|
||||
该类是 :class:`mindspore.nn.DynamicLossScaleManager` 的 `get_update_cell` 方法的返回值。训练过程中,类 :class:`mindspore.TrainOneStepWithLossScaleCell` 会调用该Cell来更新梯度放大系数。
|
||||
该类是 :class:`mindspore.nn.DynamicLossScaleManager` 的 `get_update_cell` 方法的返回值。训练过程中,类 :class:`mindspore.TrainOneStepWithLossScaleCell` 会调用该Cell来更新损失缩放系数。
|
||||
|
||||
**参数:**
|
||||
|
||||
- **loss_scale_value** (float) - 初始的梯度放大系数。
|
||||
- **loss_scale_value** (float) - 初始的损失缩放系数。
|
||||
- **scale_factor** (int) - 增减系数。
|
||||
- **scale_window** (int) - 未溢出时,增大梯度放大系数的最大连续训练步数。
|
||||
- **scale_window** (int) - 未溢出时,增大损失缩放系数的最大连续训练步数。
|
||||
|
||||
**输入:**
|
||||
|
||||
- **loss_scale** (Tensor) - 训练期间的梯度放大系数,shape为 :math:`()`。
|
||||
- **loss_scale** (Tensor) - 训练期间的损失缩放系数,是一个标量。
|
||||
- **overflow** (bool) - 是否发生溢出。
|
||||
|
||||
**输出:**
|
||||
|
@ -59,4 +59,4 @@ mindspore.nn.DynamicLossScaleUpdateCell
|
|||
|
||||
.. py:method:: get_loss_scale()
|
||||
|
||||
获取当前梯度放大系数。
|
||||
获取当前损失缩放系数。
|
||||
|
|
|
@ -2,7 +2,7 @@ mindspore.nn.FTRL
|
|||
=================
|
||||
.. py:class:: mindspore.nn.FTRL(*args, **kwargs)
|
||||
|
||||
使用ApplyFtrl算子实现FTRL算法。
|
||||
FTRL算法实现。
|
||||
|
||||
FTRL是一种在线凸优化算法,根据损失函数自适应地选择正则化函数。详见论文 `Adaptive Bound Optimization for Online Convex Optimization <https://arxiv.org/abs/1002.4908>`_。工程文档参阅 `Ad Click Prediction: a View from the Trenches <https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf>`_。
|
||||
|
||||
|
|
|
@ -3,17 +3,17 @@ mindspore.nn.FixedLossScaleUpdateCell
|
|||
|
||||
.. py:class:: mindspore.nn.FixedLossScaleUpdateCell(loss_scale_value)
|
||||
|
||||
固定梯度放大系数的神经元。
|
||||
固定损失缩放系数的神经元。
|
||||
|
||||
该类是 :class:`mindspore.nn.FixedLossScaleManager` 的 `get_update_cell` 方法的返回值。训练过程中,类 :class:`mindspore.TrainOneStepWithLossScaleCell` 会调用该Cell。
|
||||
|
||||
**参数:**
|
||||
|
||||
- **loss_scale_value** (float) - 初始梯度放大系数。
|
||||
- **loss_scale_value** (float) - 初始损失缩放系数。
|
||||
|
||||
**输入:**
|
||||
|
||||
- **loss_scale** (Tensor) - 训练期间的梯度放大系数,shape为 :math:`()`,在当前类中,该值被忽略。
|
||||
- **loss_scale** (Tensor) - 训练期间的损失缩放系数,是一个标量。在当前类中,该值被忽略。
|
||||
- **overflow** (bool) - 是否发生溢出。
|
||||
|
||||
**输出:**
|
||||
|
@ -54,4 +54,4 @@ mindspore.nn.FixedLossScaleUpdateCell
|
|||
|
||||
.. py:method:: get_loss_scale()
|
||||
|
||||
获取当前梯度放大系数。
|
||||
获取当前损失缩放系数。
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.LARS
|
|||
|
||||
.. py:class:: mindspore.nn.LARS(*args, **kwargs)
|
||||
|
||||
使用LARSUpdate算子实现LARS算法。
|
||||
LARS算法的实现。
|
||||
|
||||
LARS算法采用大量的优化技术。详见论文 `LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS <https://arxiv.org/abs/1708.03888>`_。
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.Lamb
|
|||
|
||||
.. py:class:: mindspore.nn.Lamb(*args, **kwargs)
|
||||
|
||||
LAMB(Layer-wise Adaptive Moments optimizer for Batching training,用于批训练的分层自适应矩优化器)算法优化器。
|
||||
LAMB(Layer-wise Adaptive Moments optimizer for Batching training,用于批训练的分层自适应矩优化器)算法的实现。
|
||||
|
||||
LAMB是一种采用分层自适应批优化技术的优化算法。详见论文 `LARGE BATCH OPTIMIZATION FOR DEEP LEARNING: TRAINING BERT IN 76 MINUTES <https://arxiv.org/abs/1904.00962>`_。
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.LazyAdam
|
|||
|
||||
.. py:class:: mindspore.nn.LazyAdam(*args, **kwargs)
|
||||
|
||||
通过Adaptive Moment Estimation (Adam)算法更新梯度。请参阅论文 `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_。
|
||||
Adaptive Moment Estimation (Adam)算法的实现。请参阅论文 `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_。
|
||||
|
||||
当梯度稀疏时,此优化器将使用Lazy Adam算法。
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.Momentum
|
|||
|
||||
.. py:class:: mindspore.nn.Momentum(*args, **kwargs)
|
||||
|
||||
Momentum算法优化器。
|
||||
Momentum算法的实现。
|
||||
|
||||
有关更多详细信息,请参阅论文 `On the importance of initialization and momentum in deep learning <https://dl.acm.org/doi/10.5555/3042817.3043064>`_。
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.ProximalAdagrad
|
|||
|
||||
.. py:class:: mindspore.nn.ProximalAdagrad(*args, **kwargs)
|
||||
|
||||
使用ApplyProximalAdagrad算子实现ProximalAdagrad算法。
|
||||
ProximalAdagrad算法的实现。
|
||||
|
||||
ProximalAdagrad用于在线学习和随机优化。
|
||||
请参阅论文 `Efficient Learning using Forward-Backward Splitting <http://papers.nips.cc//paper/3793-efficient-learning-using-forward-backward-splitting.pdf>`_。
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.RMSProp
|
|||
|
||||
.. py:class:: mindspore.nn.RMSProp(*args, **kwargs)
|
||||
|
||||
实现均方根传播(RMSProp)算法。
|
||||
均方根传播(RMSProp)算法的实现。
|
||||
|
||||
根据RMSProp算法更新 `params`,算法详见 [http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf] 第29页。
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ mindspore.nn.SGD
|
|||
|
||||
.. py:class:: mindspore.nn.SGD(*args, **kwargs)
|
||||
|
||||
实现随机梯度下降。动量可选。
|
||||
随机梯度下降的实现。动量可选。
|
||||
|
||||
SGD相关介绍参见 `SGD <https://en.wikipedia.org/wiki/Stochastic_gradient_dencent>`_ 。
|
||||
|
||||
|
|
|
@ -3,16 +3,16 @@ mindspore.nn.TrainOneStepWithLossScaleCell
|
|||
|
||||
.. py:class:: mindspore.nn.TrainOneStepWithLossScaleCell(network, optimizer, scale_sense)
|
||||
|
||||
使用梯度放大功能(loss scale)的训练网络。
|
||||
使用混合精度功能的训练网络。
|
||||
|
||||
实现了包含梯度放大功能的单次训练。它使用网络、优化器和用于更新梯度放大系数的Cell(或一个Tensor)作为参数。可在host侧或device侧更新梯度放大系数。
|
||||
如果需要在host侧更新,使用Tensor作为 `scale_sense` ,否则,使用可更新梯度放大系数的Cell实例作为 `scale_sense` 。
|
||||
实现了包含损失缩放(loss scale)的单次训练。它使用网络、优化器和用于更新损失缩放系数(loss scale)的Cell(或一个Tensor)作为参数。可在host侧或device侧更新损失缩放系数。
|
||||
如果需要在host侧更新,使用Tensor作为 `scale_sense` ,否则,使用可更新损失缩放系数的Cell实例作为 `scale_sense` 。
|
||||
|
||||
**参数:**
|
||||
|
||||
- **network** (Cell) - 训练网络。仅支持单输出网络。
|
||||
- **optimizer** (Cell) - 用于更新网络参数的优化器。
|
||||
- **scale_sense** (Union[Tensor, Cell]) - 如果此值为Cell类型,`TrainOneStepWithLossScaleCell` 会调用它来更新梯度放大系数。如果此值为Tensor类型,可调用 `set_sense_scale` 来更新梯度放大系数,shape为 :math:`()` 或 :math:`(1,)` 。
|
||||
- **scale_sense** (Union[Tensor, Cell]) - 如果此值为Cell类型,`TrainOneStepWithLossScaleCell` 会调用它来更新损失缩放系数。如果此值为Tensor类型,可调用 `set_sense_scale` 来更新损失缩放系数,shape为 :math:`()` 或 :math:`(1,)` 。
|
||||
|
||||
**输入:**
|
||||
|
||||
|
@ -20,11 +20,11 @@ mindspore.nn.TrainOneStepWithLossScaleCell
|
|||
|
||||
**输出:**
|
||||
|
||||
Tuple,包含三个Tensor,分别为损失函数值、溢出状态和当前梯度放大系数。
|
||||
Tuple,包含三个Tensor,分别为损失函数值、溢出状态和当前损失缩放系数。
|
||||
|
||||
- **loss** (Tensor) - shape为 :math:`()` 的Tensor。
|
||||
- **overflow** (Tensor)- shape为 :math:`()` 的Tensor,类型为bool。
|
||||
- **loss scale** (Tensor)- shape为 :math:`()` 的Tensor。
|
||||
- **loss** (Tensor) - 标量,表示损失函数值。
|
||||
- **overflow** (Tensor)- 类型为bool的标量,表示是否发生溢出。
|
||||
- **loss scale** (Tensor)- 表示损失放大系数,shape为 :math:`()` 或 :math:`(1,)` 。
|
||||
|
||||
**异常:**
|
||||
|
||||
|
@ -94,7 +94,7 @@ mindspore.nn.TrainOneStepWithLossScaleCell
|
|||
|
||||
.. py:method:: process_loss_scale(overflow)
|
||||
|
||||
根据溢出状态计算梯度放大系数。
|
||||
根据溢出状态计算损失缩放系数。
|
||||
|
||||
继承该类自定义训练网络时,可复用该接口。
|
||||
|
||||
|
@ -113,7 +113,7 @@ mindspore.nn.TrainOneStepWithLossScaleCell
|
|||
|
||||
**参数:**
|
||||
|
||||
**sens** (Tensor)- 新的梯度放大系数,其shape和类型需要与原始 `scale_sense` 相同。
|
||||
**sens** (Tensor)- 新的损失缩放系数,其shape和类型需要与原始 `scale_sense` 相同。
|
||||
|
||||
.. py:method:: start_overflow_check(pre_cond, compute_input)
|
||||
|
||||
|
|
|
@ -9,7 +9,7 @@ mindspore.nn.WithLossCell
|
|||
|
||||
**参数:**
|
||||
|
||||
- **backbone** (Cell) - 要封装的目标网络。
|
||||
- **backbone** (Cell) - 要封装的骨干网络。
|
||||
- **loss_fn** (Cell) - 用于计算损失函数。
|
||||
|
||||
**输入:**
|
||||
|
|
|
@ -1,2 +1 @@
|
|||
在参数未分组时,优化器配置的 `weight_decay` 应用于名称含有"beta"或"gamma"的网络参数,通过网络参数分组可调整权重衰减策略。分组时,每组网络参数均可配置 `weight_decay` ,若未配置,则该组网络参数使用优化器中配置的 `weight_decay` 。
|
||||
|
||||
在参数未分组时,优化器配置的 `weight_decay` 应用于名称不含"beta"或"gamma"的网络参数,参数分组情况下,可以分组调整权重衰减策略。分组时,每组网络参数均可配置 `weight_decay` ,若未配置,则该组网络参数使用优化器中配置的 `weight_decay` 。
|
||||
|
|
|
@ -6,4 +6,4 @@
|
|||
.. py:method:: unique
|
||||
:property:
|
||||
|
||||
该属性表示是否在优化器中进行梯度去重,通常用于稀疏网络。如果梯度是稀疏的则设置为True。如果前向稀疏网络已对权重去重,即梯度是稠密的,则设置为False。未设置时默认值为True。
|
||||
该属性表示是否在优化器中进行梯度去重,通常用于稀疏网络。如果梯度是稀疏的则设置为True。如果前向稀疏网络已对权重去重,即梯度是稠密的,则设置为False。未进行任何配置时默认为True。
|
||||
|
|
|
@ -38,7 +38,7 @@ def _check_param_value(accum, update_slots, prim_name=None):
|
|||
|
||||
class Adagrad(Optimizer):
|
||||
r"""
|
||||
Implements the Adagrad algorithm with ApplyAdagrad Operator.
|
||||
Implements the Adagrad algorithm.
|
||||
|
||||
Adagrad is an online Learning and Stochastic Optimization.
|
||||
Refer to paper `Efficient Learning using Forward-Backward Splitting
|
||||
|
@ -49,7 +49,7 @@ class Adagrad(Optimizer):
|
|||
|
||||
.. math::
|
||||
\begin{array}{ll} \\
|
||||
h_{t+1} = h_{t} + g\\
|
||||
h_{t+1} = h_{t} + g*g\\
|
||||
w_{t+1} = w_{t} - lr*\frac{1}{\sqrt{h_{t+1}}}*g
|
||||
\end{array}
|
||||
|
||||
|
|
|
@ -188,7 +188,7 @@ def _check_param_value(beta1, beta2, eps, prim_name):
|
|||
|
||||
class Adam(Optimizer):
|
||||
r"""
|
||||
Updates gradients by the Adaptive Moment Estimation (Adam) algorithm.
|
||||
Implements the Adaptive Moment Estimation (Adam) algorithm.
|
||||
|
||||
The Adam optimizer can dynamically adjust the learning rate of each parameter using the first-order
|
||||
moment estimation and the second-order moment estimation of the gradient.
|
||||
|
|
|
@ -75,7 +75,7 @@ def _check_param(initial_accum, lr_power, l1, l2, use_locking, prim_name=None):
|
|||
|
||||
class FTRL(Optimizer):
|
||||
r"""
|
||||
Implements the FTRL algorithm with ApplyFtrl Operator.
|
||||
Implements the FTRL algorithm.
|
||||
|
||||
FTRL is an online convex optimization algorithm that adaptively chooses its regularization function
|
||||
based on the loss functions. Refer to paper `Adaptive Bound Optimization for Online Convex Optimization
|
||||
|
|
|
@ -172,7 +172,7 @@ def _check_param_value(beta1, beta2, eps, prim_name):
|
|||
|
||||
class Lamb(Optimizer):
|
||||
r"""
|
||||
An optimizer that implements the Lamb(Layer-wise Adaptive Moments optimizer for Batching training) algorithm.
|
||||
Implements the Lamb(Layer-wise Adaptive Moments optimizer for Batching training) algorithm.
|
||||
|
||||
LAMB is an optimization algorithm employing a layerwise adaptive large batch optimization technique.
|
||||
Refer to the paper `LARGE BATCH OPTIMIZATION FOR DEEP LEARNING: TRAINING BERT IN 76
|
||||
|
|
|
@ -49,7 +49,7 @@ def _check_param_value(optimizer, epsilon, coefficient, use_clip, prim_name):
|
|||
|
||||
class LARS(Optimizer):
|
||||
r"""
|
||||
Implements the LARS algorithm with LARSUpdate Operator.
|
||||
Implements the LARS algorithm.
|
||||
|
||||
LARS is an optimization algorithm employing a large batch optimization technique. Refer to paper `LARGE BATCH
|
||||
TRAINING OF CONVOLUTIONAL NETWORKS <https://arxiv.org/abs/1708.03888>`_.
|
||||
|
|
|
@ -106,7 +106,7 @@ def _check_param_value(beta1, beta2, eps, weight_decay, prim_name):
|
|||
|
||||
class LazyAdam(Optimizer):
|
||||
r"""
|
||||
Updates gradients by the Adaptive Moment Estimation (Adam) algorithm. The Adam algorithm is proposed
|
||||
Implements the Adaptive Moment Estimation (Adam) algorithm. The Adam algorithm is proposed
|
||||
in `Adam: A Method for Stochastic Optimization <https://arxiv.org/abs/1412.6980>`_.
|
||||
|
||||
This optimizer will apply a lazy adam algorithm when gradient is sparse.
|
||||
|
|
|
@ -40,7 +40,7 @@ def _tensor_run_opt_ext(opt, momentum, learning_rate, gradient, weight, moment,
|
|||
|
||||
class Momentum(Optimizer):
|
||||
r"""
|
||||
An optimizer that implements the Momentum algorithm.
|
||||
Implements the Momentum algorithm.
|
||||
|
||||
Refer to the paper `On the importance of initialization and momentum in deep
|
||||
learning <https://dl.acm.org/doi/10.5555/3042817.3043064>`_ for more details.
|
||||
|
|
|
@ -54,7 +54,7 @@ def _check_param_value(accum, l1, l2, use_locking, prim_name=None):
|
|||
|
||||
class ProximalAdagrad(Optimizer):
|
||||
r"""
|
||||
Implements the ProximalAdagrad algorithm with ApplyProximalAdagrad Operator.
|
||||
Implements the ProximalAdagrad algorithm.
|
||||
|
||||
ProximalAdagrad is an online Learning and Stochastic Optimization.
|
||||
Refer to paper `Efficient Learning using Forward-Backward Splitting
|
||||
|
|
|
@ -73,7 +73,7 @@ class WithLossCell(Cell):
|
|||
the computed loss will be returned.
|
||||
|
||||
Args:
|
||||
backbone (Cell): The target network to wrap.
|
||||
backbone (Cell): The backbone network to wrap.
|
||||
loss_fn (Cell): The loss function used to compute loss.
|
||||
|
||||
Inputs:
|
||||
|
|
|
@ -245,9 +245,9 @@ class TrainOneStepWithLossScaleCell(TrainOneStepCell):
|
|||
Outputs:
|
||||
Tuple of 3 Tensor, the loss, overflow flag and current loss scale value.
|
||||
|
||||
- **loss** (Tensor) - Tensor with shape :math:`()`.
|
||||
- **overflow** (Tensor) - Tensor with shape :math:`()`, type is bool.
|
||||
- **loss scale** (Tensor) - Tensor with shape :math:`()`
|
||||
- **loss** (Tensor) - A scalar, the loss value.
|
||||
- **overflow** (Tensor) - A scalar, whether overflow occur or not, the type is bool.
|
||||
- **loss scale** (Tensor) - The loss scale value, the shape is :math:`()` or :math:`(1,)`.
|
||||
|
||||
Raises:
|
||||
TypeError: If `scale_sense` is neither Cell nor Tensor.
|
||||
|
|
|
@ -140,45 +140,46 @@ def build_train_network(network, optimizer, loss_fn=None, level='O0', boost_leve
|
|||
|
||||
Args:
|
||||
network (Cell): Definition of the network.
|
||||
loss_fn (Union[None, Cell]): Definition of the loss_fn. If None, the `network` should have the loss inside.
|
||||
loss_fn (Union[None, Cell]): Define the loss function. If None, the `network` should have the loss inside.
|
||||
Default: None.
|
||||
optimizer (Optimizer): Optimizer to update the Parameter.
|
||||
optimizer (Optimizer): Define the optimizer to update the Parameter.
|
||||
level (str): Supports ["O0", "O2", "O3", "auto"]. Default: "O0".
|
||||
|
||||
- O0: Do not change.
|
||||
- O2: Cast network to float16, keep batchnorm and `loss_fn` (if set) run in float32,
|
||||
- "O0": Do not change.
|
||||
- "O2": Cast network to float16, keep batchnorm and `loss_fn` (if set) run in float32,
|
||||
using dynamic loss scale.
|
||||
- O3: Cast network to float16, with additional property `keep_batchnorm_fp32=False` .
|
||||
- auto: Set to level to recommended level in different devices. Set level to O2 on GPU, Set
|
||||
level to O3 Ascend. The recommended level is chosen by the export experience, cannot
|
||||
always general. User should specify the level for special network.
|
||||
- "O3": Cast network to float16, with additional property `keep_batchnorm_fp32=False` .
|
||||
- auto: Set to level to recommended level in different devices. Set level to "O2" on GPU, Set
|
||||
level to "O3" Ascend. The recommended level is chosen by the export experience, not applicable to all
|
||||
scenarios. User should specify the level for special network.
|
||||
|
||||
O2 is recommended on GPU, O3 is recommended on Ascend. Property of `keep_batchnorm_fp32`, `cast_model_type`
|
||||
and `loss_scale_manager` determined by `level` setting may be overwritten by settings in `kwargs`.
|
||||
"O2" is recommended on GPU, "O3" is recommended on Ascend. Property of `keep_batchnorm_fp32`,
|
||||
`cast_model_type` and `loss_scale_manager` determined by `level` setting may be overwritten by settings in
|
||||
`kwargs`.
|
||||
|
||||
boost_level (str): Option for argument `level` in `mindspore.boost` , level for boost mode
|
||||
training. Supports ["O0", "O1", "O2"]. Default: "O0".
|
||||
|
||||
- O0: Do not change.
|
||||
- O1: Enable the boost mode, the performance is improved by about 20%, and
|
||||
- "O0": Do not change.
|
||||
- "O1": Enable the boost mode, the performance is improved by about 20%, and
|
||||
the accuracy is the same as the original accuracy.
|
||||
- O2: Enable the boost mode, the performance is improved by about 30%, and
|
||||
- "O2": Enable the boost mode, the performance is improved by about 30%, and
|
||||
the accuracy is reduced by less than 3%.
|
||||
|
||||
If O1 or O2 mode is set, the boost related library will take effect automatically.
|
||||
If "O1" or "O2" mode is set, the boost related library will take effect automatically.
|
||||
|
||||
cast_model_type (:class:`mindspore.dtype`): Supports `mstype.float16` or `mstype.float32` . If set, the
|
||||
network will be casted to `cast_model_type` ( `mstype.float16` or `mstype.float32` ), but not to be casted
|
||||
to the type determined by `level` setting.
|
||||
keep_batchnorm_fp32 (bool): Keep Batchnorm run in `float32` when the network is set to cast to `float16` . If
|
||||
set, the `level` setting will take no effect on this property.
|
||||
loss_scale_manager (Union[None, LossScaleManager]): If None, not scale the loss, otherwise scale the loss by
|
||||
`LossScaleManager` . If set, the `level` setting will take no effect on this property.
|
||||
loss_scale_manager (Union[None, LossScaleManager]): If not None, must be subclass of
|
||||
:class:`mindspore.LossScaleManager` for scaling the loss. If set, the `level` setting will take no effect
|
||||
on this property.
|
||||
|
||||
Raises:
|
||||
ValueError: Auto mixed precision only supported on device GPU and Ascend. If device is CPU, a `ValueError`
|
||||
exception will be raised.
|
||||
ValueError: If device is CPU, property `loss_scale_manager` only can be set as `None` or `FixedLossScaleManager`
|
||||
(with property `drop_overflow_update=False` ), or a `ValueError` exception will be raised.
|
||||
ValueError: If device is CPU, property `loss_scale_manager` is not `None` or `FixedLossScaleManager`
|
||||
(with property `drop_overflow_update=False` ).
|
||||
"""
|
||||
validator.check_value_type('network', network, nn.Cell)
|
||||
validator.check_value_type('optimizer', optimizer, (nn.Optimizer, boost.FreezeOpt))
|
||||
|
|
|
@ -20,7 +20,8 @@ from .. import nn
|
|||
|
||||
class LossScaleManager:
|
||||
"""
|
||||
Loss scale (Magnification factor of gradients when mix precision is used) manager abstract class.
|
||||
Loss scale (Magnification factor of gradients when mix precision is used) manager abstract class when using
|
||||
mixed precision.
|
||||
|
||||
Derived class needs to implement all of its methods. `get_loss_scale` is used to get current loss scale value.
|
||||
`update_loss_scale` is used to update loss scale value, `update_loss_scale` will be called during the training.
|
||||
|
|
|
@ -114,15 +114,15 @@ class Model:
|
|||
amp_level (str): Option for argument `level` in :func:`mindspore.build_train_network`, level for mixed
|
||||
precision training. Supports ["O0", "O2", "O3", "auto"]. Default: "O0".
|
||||
|
||||
- O0: Do not change.
|
||||
- O2: Cast network to float16, keep batchnorm run in float32, using dynamic loss scale.
|
||||
- O3: Cast network to float16, the batchnorm is also cast to float16, loss scale will not be used.
|
||||
- auto: Set level to recommended level in different devices. Set level to O2 on GPU, set
|
||||
level to O3 on Ascend. The recommended level is chosen by the export experience, not applicable to all
|
||||
- "O0": Do not change.
|
||||
- "O2": Cast network to float16, keep BatchNorm run in float32, using dynamic loss scale.
|
||||
- "O3": Cast network to float16, the BatchNorm is also cast to float16, loss scale will not be used.
|
||||
- auto: Set level to recommended level in different devices. Set level to "O2" on GPU, set
|
||||
level to "O3" on Ascend. The recommended level is chosen by the export experience, not applicable to all
|
||||
scenarios. User should specify the level for special network.
|
||||
|
||||
O2 is recommended on GPU, O3 is recommended on Ascend.
|
||||
The batchnorm strategy can be changed by `keep_batchnorm_fp32` settings in `kwargs`. `keep_batchnorm_fp32`
|
||||
"O2" is recommended on GPU, "O3" is recommended on Ascend.
|
||||
The BatchNorm strategy can be changed by `keep_batchnorm_fp32` settings in `kwargs`. `keep_batchnorm_fp32`
|
||||
must be a bool. The loss scale strategy can be changed by `loss_scale_manager` setting in `kwargs`.
|
||||
`loss_scale_manager` should be a subclass of :class:`mindspore.LossScaleManager`.
|
||||
The more detailed explanation of `amp_level` setting can be found at `mindspore.build_train_network`.
|
||||
|
@ -130,10 +130,10 @@ class Model:
|
|||
boost_level (str): Option for argument `level` in `mindspore.boost`, level for boost mode
|
||||
training. Supports ["O0", "O1", "O2"]. Default: "O0".
|
||||
|
||||
- O0: Do not change.
|
||||
- O1: Enable the boost mode, the performance is improved by about 20%, and
|
||||
- "O0": Do not change.
|
||||
- "O1": Enable the boost mode, the performance is improved by about 20%, and
|
||||
the accuracy is the same as the original accuracy.
|
||||
- O2: Enable the boost mode, the performance is improved by about 30%, and
|
||||
- "O2": Enable the boost mode, the performance is improved by about 30%, and
|
||||
the accuracy is reduced by less than 3%.
|
||||
|
||||
If you want to config boost mode by yourself, you can set boost_config_dict as `boost.py`.
|
||||
|
@ -906,11 +906,10 @@ class Model:
|
|||
Configure to pynative mode or CPU, the evaluating process will be performed with dataset non-sink mode.
|
||||
|
||||
Note:
|
||||
If dataset_sink_mode is True, data will be sent to device. If the device is Ascend, features
|
||||
If dataset_sink_mode is True, data will be sent to device. At this point, the dataset will be bound to this
|
||||
model, so the dataset cannot be used by other models. If the device is Ascend, features
|
||||
of data will be transferred one by one. The limitation of data transmission per time is 256M.
|
||||
|
||||
If dataset_sink_mode is True, dataset will be bound to this model and cannot be used by other models.
|
||||
|
||||
The interface builds the computational graphs and then executes the computational graphs. However, when
|
||||
the `Model.build` is executed first, it only performs the graphs execution.
|
||||
|
||||
|
@ -1107,7 +1106,8 @@ class Model:
|
|||
Batch data should be put together in one tensor.
|
||||
|
||||
Args:
|
||||
predict_data (Tensor): One tensor or multiple tensors of predict data.
|
||||
predict_data (Optional[Tensor, list[Tensor], tuple[Tensor]]): The predict data, can be a single tensor,
|
||||
a list of tensor, or a tuple of tensor.
|
||||
|
||||
Returns:
|
||||
Dict, Parameter layout dictionary used for load distributed checkpoint.
|
||||
|
|
Loading…
Reference in New Issue