qml.ShotAdaptiveOptimizer

class ShotAdaptiveOptimizer(min_shots, term_sampling=None, mu=0.99, b=1e-06, stepsize=0.07)[source]

Bases: pennylane.optimize.gradient_descent.GradientDescentOptimizer

Optimizer where the shot rate is adaptively calculated using the variances of the parameter-shift gradient.

By keeping a running average of the parameter-shift gradient and the variance of the parameter-shift gradient, this optimizer frugally distributes a shot budget across the partial derivatives of each parameter.

In addition, weighted random sampling can be used to further distribute the shot budget across the local terms from which the Hamiltonian is constructed.

Note

The shot adaptive optimizer only supports single QNode objects as objective functions. The bound device must also be instantiated with a finite number of shots.

Parameters
  • min_shots (int) – The minimum number of shots used to estimate the expectations of each term in the Hamiltonian. Note that this must be larger than 2 for the variance of the gradients to be computed.

  • mu (float) – The running average constant \(\mu \in [0, 1]\). Used to control how quickly the number of shots recommended for each gradient component changes.

  • b (float) – Regularization bias. The bias should be kept small, but non-zero.

  • term_sampling (str) – The random sampling algorithm to multinomially distribute the shot budget across terms in the Hamiltonian expectation value. Currently, only "weighted_random_sampling" is supported. The default value is None, which disables the random sampling behaviour.

  • stepsize (float) –

    The learning rate \(\eta\). The learning rate must be such that \(\eta < 2/L = 2/\sum_i|c_i|\), where:

    • \(L \leq \sum_i|c_i|\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function, and

    • \(c_i\) are the coefficients of the Hamiltonian used in the objective function.

Example

For VQE/VQE-like problems, the objective function for the optimizer can be realized as a QNode object measuring the expectation of a Hamiltonian.

>>> from pennylane import numpy as np
>>> coeffs = [2, 4, -1, 5, 2]
>>> obs = [
...   qml.X(1),
...   qml.Z(1),
...   qml.X(0) @ qml.X(1),
...   qml.Y(0) @ qml.Y(1),
...   qml.Z(0) @ qml.Z(1)
... ]
>>> H = qml.Hamiltonian(coeffs, obs)
>>> dev = qml.device("default.qubit", wires=2, shots=100)
>>> @qml.qnode(dev)
>>> def cost(weights):
...     qml.StronglyEntanglingLayers(weights, wires=range(2))
...     return qml.expval(H)

Once constructed, the cost function can be passed directly to the optimizer’s step method. The attributes opt.shots_used and opt.total_shots_used can be used to track the number of shots per iteration, and across the life of the optimizer, respectively.

>>> shape = qml.templates.StronglyEntanglingLayers.shape(n_layers=2, n_wires=2)
>>> params = np.random.random(shape)
>>> opt = qml.ShotAdaptiveOptimizer(min_shots=10, term_sampling="weighted_random_sampling")
>>> for i in range(60):
...    params = opt.step(cost, params)
...    print(f"Step {i}: cost = {cost(params):.2f}, shots_used = {opt.total_shots_used}")
Step 0: cost = -5.69, shots_used = 240
Step 1: cost = -2.98, shots_used = 336
Step 2: cost = -4.97, shots_used = 624
Step 3: cost = -5.53, shots_used = 1054
Step 4: cost = -6.50, shots_used = 1798
Step 5: cost = -6.68, shots_used = 2942
Step 6: cost = -6.99, shots_used = 4350
Step 7: cost = -6.97, shots_used = 5814
Step 8: cost = -7.00, shots_used = 7230
Step 9: cost = -6.69, shots_used = 9006
Step 10: cost = -6.85, shots_used = 11286
Step 11: cost = -6.63, shots_used = 14934
Step 12: cost = -6.86, shots_used = 17934
Step 13: cost = -7.19, shots_used = 22950
Step 14: cost = -6.99, shots_used = 28302
Step 15: cost = -7.38, shots_used = 34134
Step 16: cost = -7.66, shots_used = 41022
Step 17: cost = -7.21, shots_used = 48918
Step 18: cost = -7.53, shots_used = 56286
Step 19: cost = -7.46, shots_used = 63822
Step 20: cost = -7.31, shots_used = 72534
Step 21: cost = -7.23, shots_used = 82014
Step 22: cost = -7.31, shots_used = 92838

The shot adaptive optimizer is based on the iCANS1 optimizer by Kübler et al. (2020), and works as follows:

  1. The initial step of the optimizer is performed with some specified minimum number of shots, \(s_{min}\), for all partial derivatives.

  2. The parameter-shift rule is then used to estimate the gradient \(g_i\) with \(s_i\) shots for each parameter \(\theta_i\), parameters, as well as the variances \(v_i\) of the estimated gradients.

  3. Gradient descent is performed for each parameter \(\theta_i\), using the pre-defined learning rate \(\eta\) and the gradient information \(g_i\): \(\theta_i \rightarrow \theta_i - \eta g_i\).

  4. A maximum shot number is set by maximizing the improvement in the expected gain per shot. For a specific parameter value, the improvement in the expected gain per shot is then calculated via

    \[\gamma_i = \frac{1}{s_i} \left[ \left(\eta - \frac{1}{2} L\eta^2\right) g_i^2 - \frac{L\eta^2}{2s_i}v_i \right],\]

    where:

    • \(L \leq \sum_i|c_i|\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function,

    • \(c_i\) are the coefficients of the Hamiltonian, and

    • \(\eta\) is the learning rate, and must be bound such that \(\eta < 2/L\) for the above expression to hold.

  5. Finally, the new values of \(s_{i+1}\) (shots for partial derivative of parameter \(\theta_i\)) is given by:

    \[s_{i+1} = \frac{2L\eta}{2-L\eta}\left(\frac{v_i}{g_i^2}\right)\propto \frac{v_i}{g_i^2}.\]

In addition to the above, to counteract the presence of noise in the system, a running average of \(g_i\) and \(s_i\) (\(\chi_i\) and \(\xi_i\) respectively) are used when computing \(\gamma_i\) and \(s_i\).

For more details, see:

  • Andrew Arrasmith, Lukasz Cincio, Rolando D. Somma, and Patrick J. Coles. “Operator Sampling for Shot-frugal Optimization in Variational Algorithms.” arXiv:2004.06252 (2020).

  • Jonas M. Kübler, Andrew Arrasmith, Lukasz Cincio, and Patrick J. Coles. “An Adaptive Optimizer for Measurement-Frugal Variational Algorithms.” Quantum 4, 263 (2020).

apply_grad(grad, args)

Update the variables to take a single optimization step.

check_device(dev)

Verifies that the device used by the objective function is non-analytic.

check_learning_rate(coeffs)

Verifies that the learning rate is less than 2 over the Lipschitz constant, where the Lipschitz constant is given by \(\sum |c_i|\) for Hamiltonian coefficients \(c_i\).

compute_grad(objective_fn, args, kwargs)

Compute the gradient of the objective function, as well as the variance of the gradient, at the given point.

qnode_weighted_random_sampling(qnode, ...)

Returns an array of length shots containing single-shot estimates of the Hamiltonian gradient.

step(objective_fn, *args, **kwargs)

Update trainable arguments with one step of the optimizer.

step_and_cost(objective_fn, *args, **kwargs)

Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step.

apply_grad(grad, args)

Update the variables to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters
  • grad (tuple [array]) – the gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)

  • args (tuple) – the current value of the variables \(x^{(t)}\)

Returns

the new values \(x^{(t+1)}\)

Return type

list [array]

static check_device(dev)[source]

Verifies that the device used by the objective function is non-analytic.

Parameters

dev (devices.Device) – the device to verify

Raises

ValueError – if the device is analytic

check_learning_rate(coeffs)[source]

Verifies that the learning rate is less than 2 over the Lipschitz constant, where the Lipschitz constant is given by \(\sum |c_i|\) for Hamiltonian coefficients \(c_i\).

Parameters

coeffs (Sequence[float]) – the coefficients of the terms in the Hamiltonian

Raises

ValueError – if the learning rate is large than \(2/\sum |c_i|\)

compute_grad(objective_fn, args, kwargs)[source]

Compute the gradient of the objective function, as well as the variance of the gradient, at the given point.

Parameters
  • objective_fn (function) – the objective function for optimization

  • args – arguments to the objective function

  • kwargs – keyword arguments to the objective function

Returns

a tuple of NumPy arrays containing the gradient \(\nabla f(x^{(t)})\) and the variance of the gradient

Return type

tuple[array[float], array[float]]

static qnode_weighted_random_sampling(qnode, coeffs, observables, shots, argnums, *args, **kwargs)[source]

Returns an array of length shots containing single-shot estimates of the Hamiltonian gradient. The shots are distributed randomly over the terms in the Hamiltonian, as per a multinomial distribution.

Parameters
  • qnode (QNode) – A QNode that returns the expectation value of a Hamiltonian.

  • coeffs (List[float]) – The coefficients of the Hamiltonian being measured

  • observables (List[Observable]) – The terms of the Hamiltonian being measured

  • shots (int) – The number of shots used to estimate the Hamiltonian expectation value. These shots are distributed over the terms in the Hamiltonian, as per a Multinomial distribution.

  • argnums (Sequence[int]) – the QNode argument indices which are trainable

  • *args – Arguments to the QNode

  • **kwargs – Keyword arguments to the QNode

Returns

the single-shot gradients of the Hamiltonian expectation value

Return type

array[float]

step(objective_fn, *args, **kwargs)[source]

Update trainable arguments with one step of the optimizer.

Parameters
  • objective_fn (function) – the objective function for optimization

  • *args – variable length argument list for objective function

  • **kwargs – variable length of keyword arguments for the objective function

Returns

The new variable values \(x^{(t+1)}\). If single arg is provided, list[array] is replaced by array.

Return type

list[array]

step_and_cost(objective_fn, *args, **kwargs)[source]

Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step.

The objective function will be evaluated using the maximum number of shots across all parameters as determined by the optimizer during the optimization step.

Warning

Unlike other gradient descent optimizers, the objective function will be evaluated separately to the gradient computation, and will result in extra device evaluations.

Parameters
  • objective_fn (function) – the objective function for optimization

  • *args – variable length argument list for objective function

  • **kwargs – variable length of keyword arguments for the objective function

Returns

the new variable values \(x^{(t+1)}\) and the objective function output prior to the step. If single arg is provided, list [array] is replaced by array.

Return type

tuple[list [array], float]

Contents

Using PennyLane

Release news

Development

API

Internals