qml.AdamOptimizer¶
- class AdamOptimizer(stepsize=0.01, beta1=0.9, beta2=0.99, eps=1e-08)[source]¶
Bases:
pennylane.optimize.gradient_descent.GradientDescentOptimizer
Gradient-descent optimizer with adaptive learning rate, first and second moment.
Adaptive Moment Estimation uses a step-dependent learning rate, a first moment \(a\) and a second moment \(b\), reminiscent of the momentum and velocity of a particle:
\[x^{(t+1)} = x^{(t)} - \eta^{(t+1)} \frac{a^{(t+1)}}{\sqrt{b^{(t+1)}} + \epsilon },\]where the update rules for the two moments are given by
\[\begin{split}a^{(t+1)} &= \beta_1 a^{(t)} + (1-\beta_1) \nabla f(x^{(t)}),\\ b^{(t+1)} &= \beta_2 b^{(t)} + (1-\beta_2) (\nabla f(x^{(t)}))^{\odot 2},\\ \eta^{(t+1)} &= \eta \frac{\sqrt{(1-\beta_2^{t+1})}}{(1-\beta_1^{t+1})}.\end{split}\]Above, \(( \nabla f(x^{(t-1)}))^{\odot 2}\) denotes the element-wise square operation, which means that each element in the gradient is multiplied by itself. The hyperparameters \(\beta_1\) and \(\beta_2\) can also be step-dependent. Initially, the first and second moment are zero.
The shift \(\epsilon\) avoids division by zero.
For more details, see arXiv:1412.6980.
- Parameters
stepsize (float) – the user-defined hyperparameter \(\eta\)
beta1 (float) – hyperparameter governing the update of the first and second moment
beta2 (float) – hyperparameter governing the update of the first and second moment
eps (float) – offset \(\epsilon\) added for numerical stability
Note
When using
torch
,tensorflow
orjax
interfaces, refer to Gradients and training for suitable optimizers.Attributes
Returns estimated first moments of gradient
Returns estimated second moments of gradient
Returns accumulated timesteps
- fm¶
Returns estimated first moments of gradient
- sm¶
Returns estimated second moments of gradient
- t¶
Returns accumulated timesteps
Methods
apply_grad
(grad, args)Update the variables args to take a single optimization step.
compute_grad
(objective_fn, args, kwargs[, ...])Compute the gradient of the objective function at the given point and return it along with the objective function forward pass (if available).
reset
()Reset optimizer by erasing memory of past steps.
step
(objective_fn, *args[, grad_fn])Update trainable arguments with one step of the optimizer.
step_and_cost
(objective_fn, *args[, grad_fn])Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step.
- apply_grad(grad, args)[source]¶
Update the variables args to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.
- Parameters
grad (tuple[ndarray]) – the gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
args (tuple) – the current value of the variables \(x^{(t)}\)
- Returns
the new values \(x^{(t+1)}\)
- Return type
list
- static compute_grad(objective_fn, args, kwargs, grad_fn=None)¶
Compute the gradient of the objective function at the given point and return it along with the objective function forward pass (if available).
- Parameters
objective_fn (function) – the objective function for optimization
args (tuple) – tuple of NumPy arrays containing the current parameters for the objection function
kwargs (dict) – keyword arguments for the objective function
grad_fn (function) – optional gradient function of the objective function with respect to the variables
args
. IfNone
, the gradient function is computed automatically. Must return the same shape of tuple [array] as the autograd derivative.
- Returns
NumPy array containing the gradient \(\nabla f(x^{(t)})\) and the objective function output. If
grad_fn
is provided, the objective function will not be evaluated and insteadNone
will be returned.- Return type
tuple (array)
- step(objective_fn, *args, grad_fn=None, **kwargs)¶
Update trainable arguments with one step of the optimizer.
- Parameters
objective_fn (function) – the objective function for optimization
*args – Variable length argument list for objective function
grad_fn (function) – optional gradient function of the objective function with respect to the variables
x
. IfNone
, the gradient function is computed automatically. Must return atuple[array]
with the same number of elements as*args
. Each array of the tuple should have the same shape as the corresponding argument.**kwargs – variable length of keyword arguments for the objective function
- Returns
the new variable values \(x^{(t+1)}\). If single arg is provided, list [array] is replaced by array.
- Return type
list [array]
- step_and_cost(objective_fn, *args, grad_fn=None, **kwargs)¶
Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step.
- Parameters
objective_fn (function) – the objective function for optimization
*args – variable length argument list for objective function
grad_fn (function) – optional gradient function of the objective function with respect to the variables
*args
. IfNone
, the gradient function is computed automatically. Must return atuple[array]
with the same number of elements as*args
. Each array of the tuple should have the same shape as the corresponding argument.**kwargs – variable length of keyword arguments for the objective function
- Returns
the new variable values \(x^{(t+1)}\) and the objective function output prior to the step. If single arg is provided, list [array] is replaced by array.
- Return type
tuple[list [array], float]