qml.SPSAOptimizer¶
- class SPSAOptimizer(maxiter=None, alpha=0.602, gamma=0.101, c=0.2, A=None, a=None)[source]¶
Bases:
object
The Simultaneous Perturbation Stochastic Approximation method (SPSA) is a stochastic approximation algorithm for optimizing cost functions whose evaluation may involve noise.
While other gradient-based optimization methods usually attempt to compute the gradient analytically, SPSA involves approximating gradients at the cost of evaluating the cost function twice in each iteration step. This cost may result in a significant decrease in the overall cost of function evaluations for the entire optimization. It is based on an approximation of the unknown gradient \(\hat{g}(\hat{\theta}_{k})\) through a simultaneous perturbation of the input parameters:
\[\begin{split}\hat{g}_k(\hat{\theta}_k) = \frac{y(\hat{\theta}_k+c_k\Delta_k)- y(\hat{\theta}_k-c_k\Delta_k)}{2c_k} \begin{bmatrix} \Delta_{k1}^{-1} \\ \Delta_{k2}^{-1} \\ \vdots \\ \Delta_{kp}^{-1} \end{bmatrix}\text{,}\end{split}\]where
\(k\) is the current iteration step,
\(\hat{\theta}_k\) are the input parameters at iteration step \(k\),
\(y\) is the objective function,
\(c_k=\frac{c}{k^\gamma}\) is the gain sequence corresponding to evaluation step size and it can be controlled with
scaling parameter \(c\) and
scaling exponent \(\gamma\)
\(\Delta_{ki}^{-1} \left(1 \leq i \leq p \right)\) are the inverted elements of random pertubation vector \(\Delta_k\).
\(\hat{\theta}_k\) is updated to a new set of parameters with
\[\hat{\theta}_{k+1} = \hat{\theta}_{k} - a_k\hat{g}_k(\hat{\theta}_k)\text{,}\]where the gain sequences \(a_k=\frac{a}{(A+k)^\alpha}\) controls parameter update step size.
The gain sequence \(a_k\) can be controlled with
scaling parameter \(a\),
scaling exponent \(\alpha\) and
stability constant \(A\)
For more details, see Spall (1998a).
Note
One SPSA iteration step of a cost function that involves computing the expectation value of a Hamiltonian with
M
terms requires \(2*M\) quantum device executions.The forward-pass value of the cost function is not computed when stepping the optimizer. Therefore, in case of using
step_and_cost
method instead ofstep
, the number of executions will include the cost function evaluations.
Examples:
For VQE/VQE-like problems, the objective function can be the following:
>>> from pennylane import numpy as np >>> coeffs = [0.2, -0.543, 0.4514] >>> obs = [qml.X(0) @ qml.Z(1), qml.Z(0) @ qml.Hadamard(2), ... qml.X(3) @ qml.Z(1)] >>> H = qml.Hamiltonian(coeffs, obs) >>> num_qubits = 4 >>> dev = qml.device("default.qubit", wires=num_qubits) >>> @qml.qnode(dev) ... def cost(params, num_qubits=1): ... qml.BasisState(np.array([1, 1, 0, 0]), wires=range(num_qubits)) ... for i in range(num_qubits): ... qml.Rot(*params[i], wires=0) ... qml.CNOT(wires=[2, 3]) ... qml.CNOT(wires=[2, 0]) ... qml.CNOT(wires=[3, 1]) ... return qml.expval(H) ... >>> params = np.random.normal(0, np.pi, (num_qubits, 3), requires_grad=True)
Once constructed, the cost function can be passed directly to the
step
orstep_and_cost
function of the optimizer:>>> max_iterations = 100 >>> opt = qml.SPSAOptimizer(maxiter=max_iterations) >>> for _ in range(max_iterations): ... params, energy = opt.step_and_cost(cost, params, num_qubits=num_qubits) >>> print(energy) -0.4294539602541956
The algorithm provided by SPSA does not rely on built-in automatic differentiation capabilities of the interface being used and therefore the optimizer can be used in more complex hybrid classical-quantum workflow with any of the interfaces:
>>> import tensorflow as tf >>> n_qubits = 1 >>> max_iterations = 20 >>> dev = qml.device("default.qubit", wires=n_qubits) >>> @qml.qnode(dev, interface="tf") ... def layer_fn_spsa(inputs, weights): ... qml.AngleEmbedding(inputs, wires=range(n_qubits)) ... qml.BasicEntanglerLayers(weights, wires=range(n_qubits)) ... return qml.expval(qml.Z(0)) ... >>> opt = qml.SPSAOptimizer(maxiter=max_iterations) ... def fn(params, tensor_in, tensor_out): ... with tf.init_scope(): ... for _ in range(max_iterations): ... # Some classical steps before the quantum computation ... params_a, layer_res = opt.step_and_cost(layer_fn_spsa, ... tf.constant(tensor_in), ... tf.Variable(params)) ... params = params_a[1] ... tensor_out = layer_res ... # Some classical steps after the quantum computation ... return layer_res ... >>> tensor_in = tf.Variable([0.27507603]) >>> tensor_out = tf.Variable([0]) >>> params = tf.Variable([[3.97507603], ... [3.12950603], ... [1.00854038], ... [1.25907603]]) >>> loss = fn(params, tensor_in, tensor_out) >>> print(loss) tf.Tensor(-0.9995854230771829, shape=(), dtype=float64)
- Keyword Arguments
maxiter (int) – the maximum number of iterations expected to be performed. Used to determine \(A\), if \(A\) is not supplied, otherwise ignored.
alpha (float) – A hyperparameter to calculate \(a_k=\frac{a}{(A+k+1)^\alpha}\) for each iteration. Its asymptotically optimal value is 1.0.
gamma (float) – An hyperparameter to calculate \(c_k=\frac{c}{(k+1)^\gamma}\) for each iteration. Its asymptotically optimal value is 1/6.
c (float) – A hyperparameter related to the expected noise. It should be approximately the standard deviation of the expected noise of the cost function.
A (float) – The stability constant; if not provided, set to be 10% of the maximum number of expected iterations.
a (float) – A hyperparameter expected to be small in noisy situations, its value could be picked using A, \(\alpha\) and \(\hat{g_0} (\hat{\theta_0})\). For more details, see Spall (1998b).
Methods
apply_grad
(grad, args)Update the variables to take a single optimization step.
compute_grad
(objective_fn, args, kwargs)Approximate the gradient of the objective function at the given point.
step
(objective_fn, *args, **kwargs)Update trainable arguments with one step of the optimizer.
step_and_cost
(objective_fn, *args, **kwargs)Update the parameter array \(\hat{\theta}_k\) with one step of the optimizer and return the step and the corresponding objective function.
- apply_grad(grad, args)[source]¶
Update the variables to take a single optimization step.
- Parameters
grad (tuple [array]) – the gradient approximation of the objective function at point \(\hat{\theta}_{k}\)
args (tuple) – the current value of the variables \(\hat{\theta}_{k}\)
- Returns
the new values \(\hat{\theta}_{k+1}\)
- Return type
list [array]
- compute_grad(objective_fn, args, kwargs)[source]¶
Approximate the gradient of the objective function at the given point.
- Parameters
objective_fn (function) – The objective function for optimization
args (tuple) – tuple of NumPy array containing the current parameters for objective function
kwargs (dict) – keyword arguments for the objective function
- Returns
- NumPy array containing the gradient
\(\hat{g}_k(\hat{\theta}_k)\)
- Return type
tuple (array)
- step(objective_fn, *args, **kwargs)[source]¶
Update trainable arguments with one step of the optimizer. The number of steps is being counted through calls to
step_and_cost
andcost
.- Parameters
objective_fn (function) – the objective function for optimization
*args – variable length argument array for objective function
**kwargs – variable length of keyword arguments for the objective function
- Returns
the new variable values \(\hat{\theta}_{k+1}\).
- Return type
list [array]
- step_and_cost(objective_fn, *args, **kwargs)[source]¶
Update the parameter array \(\hat{\theta}_k\) with one step of the optimizer and return the step and the corresponding objective function. The number of steps stored by the
k
attribute of the optimizer is counted internally when callingstep_and_cost
andcost
.- Parameters
objective_fn (function) – the objective function for optimization
*args – variable length argument array for objective function
**kwargs – variable length of keyword arguments for the objective function
- Returns
the new variable values \(\hat{\theta}_{k+1}\) and the objective function output prior to the step.
- Return type
tuple[list [array], float]