qml.ShotAdaptiveOptimizer¶
- class ShotAdaptiveOptimizer(min_shots, term_sampling=None, mu=0.99, b=1e-06, stepsize=0.07)[source]¶
Bases:
pennylane.optimize.gradient_descent.GradientDescentOptimizer
Optimizer where the shot rate is adaptively calculated using the variances of the parameter-shift gradient.
By keeping a running average of the parameter-shift gradient and the variance of the parameter-shift gradient, this optimizer frugally distributes a shot budget across the partial derivatives of each parameter.
In addition, weighted random sampling can be used to further distribute the shot budget across the local terms from which the Hamiltonian is constructed.
Note
The shot adaptive optimizer only supports single QNode objects as objective functions. The bound device must also be instantiated with a finite number of shots.
- Parameters
min_shots (int) – The minimum number of shots used to estimate the expectations of each term in the Hamiltonian. Note that this must be larger than 2 for the variance of the gradients to be computed.
mu (float) – The running average constant \(\mu \in [0, 1]\). Used to control how quickly the number of shots recommended for each gradient component changes.
b (float) – Regularization bias. The bias should be kept small, but non-zero.
term_sampling (str) – The random sampling algorithm to multinomially distribute the shot budget across terms in the Hamiltonian expectation value. Currently, only
"weighted_random_sampling"
is supported. The default value isNone
, which disables the random sampling behaviour.stepsize (float) –
The learning rate \(\eta\). The learning rate must be such that \(\eta < 2/L = 2/\sum_i|c_i|\), where:
\(L \leq \sum_i|c_i|\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function, and
\(c_i\) are the coefficients of the Hamiltonian used in the objective function.
Example
For VQE/VQE-like problems, the objective function for the optimizer can be realized as a
QNode
object measuring the expectation of aLinearCombination
.>>> from pennylane import numpy as np >>> coeffs = [2, 4, -1, 5, 2] >>> obs = [ ... qml.X(1), ... qml.Z(1), ... qml.X(0) @ qml.X(1), ... qml.Y(0) @ qml.Y(1), ... qml.Z(0) @ qml.Z(1) ... ] >>> H = qml.Hamiltonian(coeffs, obs) >>> dev = qml.device("default.qubit", wires=2, shots=100) >>> @qml.qnode(dev) >>> def cost(weights): ... qml.StronglyEntanglingLayers(weights, wires=range(2)) ... return qml.expval(H)
Once constructed, the cost function can be passed directly to the optimizer’s
step
method. The attributesopt.shots_used
andopt.total_shots_used
can be used to track the number of shots per iteration, and across the life of the optimizer, respectively.>>> shape = qml.templates.StronglyEntanglingLayers.shape(n_layers=2, n_wires=2) >>> params = np.random.random(shape) >>> opt = qml.ShotAdaptiveOptimizer(min_shots=10, term_sampling="weighted_random_sampling") >>> for i in range(60): ... params = opt.step(cost, params) ... print(f"Step {i}: cost = {cost(params):.2f}, shots_used = {opt.total_shots_used}") Step 0: cost = -5.69, shots_used = 240 Step 1: cost = -2.98, shots_used = 336 Step 2: cost = -4.97, shots_used = 624 Step 3: cost = -5.53, shots_used = 1054 Step 4: cost = -6.50, shots_used = 1798 Step 5: cost = -6.68, shots_used = 2942 Step 6: cost = -6.99, shots_used = 4350 Step 7: cost = -6.97, shots_used = 5814 Step 8: cost = -7.00, shots_used = 7230 Step 9: cost = -6.69, shots_used = 9006 Step 10: cost = -6.85, shots_used = 11286 Step 11: cost = -6.63, shots_used = 14934 Step 12: cost = -6.86, shots_used = 17934 Step 13: cost = -7.19, shots_used = 22950 Step 14: cost = -6.99, shots_used = 28302 Step 15: cost = -7.38, shots_used = 34134 Step 16: cost = -7.66, shots_used = 41022 Step 17: cost = -7.21, shots_used = 48918 Step 18: cost = -7.53, shots_used = 56286 Step 19: cost = -7.46, shots_used = 63822 Step 20: cost = -7.31, shots_used = 72534 Step 21: cost = -7.23, shots_used = 82014 Step 22: cost = -7.31, shots_used = 92838
Usage Details
The shot adaptive optimizer is based on the iCANS1 optimizer by Kübler et al. (2020), and works as follows:
The initial step of the optimizer is performed with some specified minimum number of shots, \(s_{min}\), for all partial derivatives.
The parameter-shift rule is then used to estimate the gradient \(g_i\) with \(s_i\) shots for each parameter \(\theta_i\), parameters, as well as the variances \(v_i\) of the estimated gradients.
Gradient descent is performed for each parameter \(\theta_i\), using the pre-defined learning rate \(\eta\) and the gradient information \(g_i\): \(\theta_i \rightarrow \theta_i - \eta g_i\).
A maximum shot number is set by maximizing the improvement in the expected gain per shot. For a specific parameter value, the improvement in the expected gain per shot is then calculated via
\[\gamma_i = \frac{1}{s_i} \left[ \left(\eta - \frac{1}{2} L\eta^2\right) g_i^2 - \frac{L\eta^2}{2s_i}v_i \right],\]where:
\(L \leq \sum_i|c_i|\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function,
\(c_i\) are the coefficients of the Hamiltonian, and
\(\eta\) is the learning rate, and must be bound such that \(\eta < 2/L\) for the above expression to hold.
Finally, the new values of \(s_{i+1}\) (shots for partial derivative of parameter \(\theta_i\)) is given by:
\[s_{i+1} = \frac{2L\eta}{2-L\eta}\left(\frac{v_i}{g_i^2}\right)\propto \frac{v_i}{g_i^2}.\]
In addition to the above, to counteract the presence of noise in the system, a running average of \(g_i\) and \(s_i\) (\(\chi_i\) and \(\xi_i\) respectively) are used when computing \(\gamma_i\) and \(s_i\).
For more details, see:
Andrew Arrasmith, Lukasz Cincio, Rolando D. Somma, and Patrick J. Coles. “Operator Sampling for Shot-frugal Optimization in Variational Algorithms.” arXiv:2004.06252 (2020).
Jonas M. Kübler, Andrew Arrasmith, Lukasz Cincio, and Patrick J. Coles. “An Adaptive Optimizer for Measurement-Frugal Variational Algorithms.” Quantum 4, 263 (2020).
Methods
apply_grad
(grad, args)Update the variables to take a single optimization step.
check_device
(dev)Verifies that the device used by the objective function is non-analytic.
check_learning_rate
(coeffs)Verifies that the learning rate is less than 2 over the Lipschitz constant, where the Lipschitz constant is given by \(\sum |c_i|\) for Hamiltonian coefficients \(c_i\).
compute_grad
(objective_fn, args, kwargs)Compute the gradient of the objective function, as well as the variance of the gradient, at the given point.
qnode_weighted_random_sampling
(qnode, ...)Returns an array of length
shots
containing single-shot estimates of the Hamiltonian gradient.step
(objective_fn, *args, **kwargs)Update trainable arguments with one step of the optimizer.
step_and_cost
(objective_fn, *args, **kwargs)Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step.
- apply_grad(grad, args)¶
Update the variables to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.
- Parameters
grad (tuple [array]) – the gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
args (tuple) – the current value of the variables \(x^{(t)}\)
- Returns
the new values \(x^{(t+1)}\)
- Return type
list [array]
- static check_device(dev)[source]¶
Verifies that the device used by the objective function is non-analytic.
- Parameters
dev (devices.Device) – the device to verify
- Raises
ValueError – if the device is analytic
- check_learning_rate(coeffs)[source]¶
Verifies that the learning rate is less than 2 over the Lipschitz constant, where the Lipschitz constant is given by \(\sum |c_i|\) for Hamiltonian coefficients \(c_i\).
- Parameters
coeffs (Sequence[float]) – the coefficients of the terms in the Hamiltonian
- Raises
ValueError – if the learning rate is large than \(2/\sum |c_i|\)
- compute_grad(objective_fn, args, kwargs)[source]¶
Compute the gradient of the objective function, as well as the variance of the gradient, at the given point.
- Parameters
objective_fn (function) – the objective function for optimization
args – arguments to the objective function
kwargs – keyword arguments to the objective function
- Returns
a tuple of NumPy arrays containing the gradient \(\nabla f(x^{(t)})\) and the variance of the gradient
- Return type
tuple[array[float], array[float]]
- static qnode_weighted_random_sampling(qnode, coeffs, observables, shots, argnums, *args, **kwargs)[source]¶
Returns an array of length
shots
containing single-shot estimates of the Hamiltonian gradient. The shots are distributed randomly over the terms in the Hamiltonian, as per a multinomial distribution.- Parameters
qnode (QNode) – A QNode that returns the expectation value of a Hamiltonian.
coeffs (List[float]) – The coefficients of the Hamiltonian being measured
observables (List[Observable]) – The terms of the Hamiltonian being measured
shots (int) – The number of shots used to estimate the Hamiltonian expectation value. These shots are distributed over the terms in the Hamiltonian, as per a Multinomial distribution.
argnums (Sequence[int]) – the QNode argument indices which are trainable
*args – Arguments to the QNode
**kwargs – Keyword arguments to the QNode
- Returns
the single-shot gradients of the Hamiltonian expectation value
- Return type
array[float]
- step(objective_fn, *args, **kwargs)[source]¶
Update trainable arguments with one step of the optimizer.
- Parameters
objective_fn (function) – the objective function for optimization
*args – variable length argument list for objective function
**kwargs – variable length of keyword arguments for the objective function
- Returns
The new variable values \(x^{(t+1)}\). If single arg is provided, list[array] is replaced by array.
- Return type
list[array]
- step_and_cost(objective_fn, *args, **kwargs)[source]¶
Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step.
The objective function will be evaluated using the maximum number of shots across all parameters as determined by the optimizer during the optimization step.
Warning
Unlike other gradient descent optimizers, the objective function will be evaluated separately to the gradient computation, and will result in extra device evaluations.
- Parameters
objective_fn (function) – the objective function for optimization
*args – variable length argument list for objective function
**kwargs – variable length of keyword arguments for the objective function
- Returns
the new variable values \(x^{(t+1)}\) and the objective function output prior to the step. If single arg is provided, list [array] is replaced by array.
- Return type
tuple[list [array], float]