forked from mindspore-Ecosystem/mindspore
!5063 modify sgd and momentum and WithGradCell comments
Merge pull request !5063 from lijiaqi/momentum_and_sgd
This commit is contained in:
commit
15ae3702f9
|
@ -56,12 +56,12 @@ class Momentum(Optimizer):
|
||||||
.. math::
|
.. math::
|
||||||
v_{t} = v_{t-1} \ast u + gradients
|
v_{t} = v_{t-1} \ast u + gradients
|
||||||
|
|
||||||
If use_nesterov is True:
|
If use_nesterov is True:
|
||||||
.. math::
|
.. math::
|
||||||
p_{t} = p_{t-1} - (grad \ast lr + v_{t} \ast u \ast lr)
|
p_{t} = p_{t-1} - (grad \ast lr + v_{t} \ast u \ast lr)
|
||||||
|
|
||||||
If use_nesterov is Flase:
|
If use_nesterov is Flase:
|
||||||
.. math::
|
.. math::
|
||||||
p_{t} = p_{t-1} - lr \ast v_{t}
|
p_{t} = p_{t-1} - lr \ast v_{t}
|
||||||
|
|
||||||
Here: where grad, lr, p, v and u denote the gradients, learning_rate, params, moments, and momentum respectively.
|
Here: where grad, lr, p, v and u denote the gradients, learning_rate, params, moments, and momentum respectively.
|
||||||
|
|
|
@ -49,12 +49,12 @@ class SGD(Optimizer):
|
||||||
.. math::
|
.. math::
|
||||||
v_{t+1} = u \ast v_{t} + gradient \ast (1-dampening)
|
v_{t+1} = u \ast v_{t} + gradient \ast (1-dampening)
|
||||||
|
|
||||||
If nesterov is True:
|
If nesterov is True:
|
||||||
.. math::
|
.. math::
|
||||||
p_{t+1} = p_{t} - lr \ast (gradient + u \ast v_{t+1})
|
p_{t+1} = p_{t} - lr \ast (gradient + u \ast v_{t+1})
|
||||||
|
|
||||||
If nesterov is Flase:
|
If nesterov is Flase:
|
||||||
.. math::
|
.. math::
|
||||||
p_{t+1} = p_{t} - lr \ast v_{t+1}
|
p_{t+1} = p_{t} - lr \ast v_{t+1}
|
||||||
|
|
||||||
To be noticed, for the first step, v_{t+1} = gradient
|
To be noticed, for the first step, v_{t+1} = gradient
|
||||||
|
|
|
@ -82,7 +82,7 @@ class WithGradCell(Cell):
|
||||||
|
|
||||||
Wraps the network with backward cell to compute gradients. A network with a loss function is necessary
|
Wraps the network with backward cell to compute gradients. A network with a loss function is necessary
|
||||||
as argument. If loss function in None, the network must be a wrapper of network and loss function. This
|
as argument. If loss function in None, the network must be a wrapper of network and loss function. This
|
||||||
Cell accepts *inputs as inputs and returns gradients for each trainable parameter.
|
Cell accepts '*inputs' as inputs and returns gradients for each trainable parameter.
|
||||||
|
|
||||||
Note:
|
Note:
|
||||||
Run in PyNative mode.
|
Run in PyNative mode.
|
||||||
|
|
Loading…
Reference in New Issue