qml.gradients.param_shift¶
- param_shift(tape, argnum=None, shifts=None, gradient_recipes=None, fallback_fn=<function finite_diff>, f0=None, broadcast=False)[source]¶
Transform a circuit to compute the parameter-shift gradient of all gate parameters with respect to its inputs.
- Parameters
tape (QNode or QuantumTape) – quantum circuit to differentiate
argnum (int or list[int] or None) – Trainable parameter indices to differentiate with respect to. If not provided, the derivative with respect to all trainable indices are returned.
shifts (list[tuple[int or float]]) – List containing tuples of shift values. If provided, one tuple of shifts should be given per trainable parameter and the tuple should match the number of frequencies for that parameter. If unspecified, equidistant shifts are assumed.
gradient_recipes (tuple(list[list[float]] or None)) –
List of gradient recipes for the parameter-shift method. One gradient recipe must be provided per trainable parameter.
This is a tuple with one nested list per parameter. For parameter \(\phi_k\), the nested list contains elements of the form \([c_i, a_i, s_i]\) where \(i\) is the index of the term, resulting in a gradient recipe of
\[\frac{\partial}{\partial\phi_k}f = \sum_{i} c_i f(a_i \phi_k + s_i).\]If
None
, the default gradient recipe containing the two terms \([c_0, a_0, s_0]=[1/2, 1, \pi/2]\) and \([c_1, a_1, s_1]=[-1/2, 1, -\pi/2]\) is assumed for every parameter.fallback_fn (None or Callable) – a fallback gradient function to use for any parameters that do not support the parameter-shift rule.
f0 (tensor_like[float] or None) – Output of the evaluated input tape. If provided, and the gradient recipe contains an unshifted term, this value is used, saving a quantum evaluation.
broadcast (bool) – Whether or not to use parameter broadcasting to create a single broadcasted tape per operation instead of one tape per shift angle.
- Returns
The transformed circuit as described in
qml.transform
. Executing this circuit will provide the Jacobian in the form of a tensor, a tuple, or a nested tuple depending upon the nesting structure of measurements in the original circuit.- Return type
qnode (QNode) or tuple[List[QuantumTape], function]
For a variational evolution \(U(\mathbf{p}) \vert 0\rangle\) with \(N\) parameters \(\mathbf{p}\), consider the expectation value of an observable \(O\):
\[f(\mathbf{p}) = \langle \hat{O} \rangle(\mathbf{p}) = \langle 0 \vert U(\mathbf{p})^\dagger \hat{O} U(\mathbf{p}) \vert 0\rangle.\]The gradient of this expectation value can be calculated via the parameter-shift rule:
\[\frac{\partial f}{\partial \mathbf{p}} = \sum_{\mu=1}^{2R} f\left(\mathbf{p}+\frac{2\mu-1}{2R}\pi\right) \frac{(-1)^{\mu-1}}{4R\sin^2\left(\frac{2\mu-1}{4R}\pi\right)}\]Here, \(R\) is the number of frequencies with which the parameter \(\mathbf{p}\) enters the function \(f\) via the operation \(U\), and we assumed that these frequencies are equidistant. For more general shift rules, both regarding the shifts and the frequencies, and for more technical details, see Vidal and Theis (2018) and Wierichs et al. (2022).
Gradients of variances
For a variational evolution \(U(\mathbf{p}) \vert 0\rangle\) with \(N\) parameters \(\mathbf{p}\), consider the variance of an observable \(O\):
\[g(\mathbf{p})=\langle \hat{O}^2 \rangle (\mathbf{p}) - [\langle \hat{O} \rangle(\mathbf{p})]^2.\]We can relate this directly to the parameter-shift rule by noting that
\[\frac{\partial g}{\partial \mathbf{p}}= \frac{\partial}{\partial \mathbf{p}} \langle \hat{O}^2 \rangle (\mathbf{p}) - 2 f(\mathbf{p}) \frac{\partial f}{\partial \mathbf{p}}.\]The derivatives in the expression on the right hand side can be computed via the shift rule as above, allowing for the computation of the variance derivative.
In the case where \(O\) is involutory (\(\hat{O}^2 = I\)), the first term in the above expression vanishes, and we are simply left with
\[\frac{\partial g}{\partial \mathbf{p}} = - 2 f(\mathbf{p}) \frac{\partial f}{\partial \mathbf{p}}.\]Example
This transform can be registered directly as the quantum gradient transform to use during autodifferentiation:
from pennylane import numpy as np dev = qml.device("default.qubit") @qml.qnode(dev, interface="autograd", diff_method="parameter-shift") def circuit(params): qml.RX(params[0], wires=0) qml.RY(params[1], wires=0) qml.RX(params[2], wires=0) return qml.expval(qml.Z(0))
>>> params = np.array([0.1, 0.2, 0.3], requires_grad=True) >>> qml.jacobian(circuit)(params) array([-0.3875172 , -0.18884787, -0.38355704])
When differentiating QNodes with multiple measurements using Autograd or TensorFlow, the outputs of the QNode first need to be stacked. The reason is that those two frameworks only allow differentiating functions with array or tensor outputs, instead of functions that output sequences. In contrast, Jax and Torch require no additional post-processing.
import jax dev = qml.device("default.qubit") @qml.qnode(dev, interface="jax", diff_method="parameter-shift") def circuit(params): qml.RX(params[0], wires=0) qml.RY(params[1], wires=0) qml.RX(params[2], wires=0) return qml.expval(qml.Z(0)), qml.var(qml.Z(0))
>>> params = jax.numpy.array([0.1, 0.2, 0.3]) >>> jax.jacobian(circuit)(params) (Array([-0.3875172 , -0.18884787, -0.38355704], dtype=float64), Array([0.69916862, 0.34072424, 0.69202359], dtype=float64))
Note
param_shift
performs multiple attempts to obtain the gradient recipes for each operation:If an operation has a custom
grad_recipe
defined, it is used.If
parameter_frequencies
yields a result, the frequencies are used to construct the general parameter-shift rule viagenerate_shift_rule()
. Note that by default, the generator is used to compute the parameter frequencies if they are not provided via a custom implementation.
That is, the order of precedence is
grad_recipe
, customparameter_frequencies
, and finallygenerator()
via the default implementation of the frequencies.Warning
Note that as the option
broadcast=True
adds a broadcasting dimension, it is not compatible with circuits that are already broadcasted. In addition, operations with trainable parameters are required to support broadcasting. One way to check this is through thesupports_broadcasting
attribute:>>> qml.RX in qml.ops.qubit.attributes.supports_broadcasting True
Usage Details
This gradient transform can be applied directly to
QNode
objects. However, for performance reasons, we recommend providing the gradient transform as thediff_method
argument of the QNode decorator, and differentiating with your preferred machine learning framework.@qml.qnode(dev) def circuit(params): qml.RX(params[0], wires=0) qml.RY(params[1], wires=0) qml.RX(params[2], wires=0) return qml.expval(qml.Z(0)), qml.var(qml.Z(0))
>>> qml.gradients.param_shift(circuit)(params) (Array([-0.3875172 , -0.18884787, -0.38355704], dtype=float64), Array([0.69916862, 0.34072424, 0.69202359], dtype=float64))
This quantum gradient transform can also be applied to low-level
QuantumTape
objects. This will result in no implicit quantum device evaluation. Instead, the processed tapes, and post-processing function, which together define the gradient are directly returned:>>> ops = [qml.RX(params[0], 0), qml.RY(params[1], 0), qml.RX(params[2], 0)] >>> measurements = [qml.expval(qml.Z(0)), qml.var(qml.Z(0))] >>> tape = qml.tape.QuantumTape(ops, measurements) >>> gradient_tapes, fn = qml.gradients.param_shift(tape) >>> gradient_tapes [<QuantumScript: wires=[0], params=3>, <QuantumScript: wires=[0], params=3>, <QuantumScript: wires=[0], params=3>, <QuantumScript: wires=[0], params=3>, <QuantumScript: wires=[0], params=3>, <QuantumScript: wires=[0], params=3>, <QuantumScript: wires=[0], params=3>]
This can be useful if the underlying circuits representing the gradient computation need to be analyzed.
Note that
argnum
refers to the index of a parameter within the list of trainable parameters. For example, if we have:>>> tape = qml.tape.QuantumScript( ... [qml.RX(1.2, wires=0), qml.RY(2.3, wires=0), qml.RZ(3.4, wires=0)], ... [qml.expval(qml.Z(0))], ... trainable_params = [1, 2] ... ) >>> qml.gradients.param_shift(tape, argnum=1)
The code above will differentiate the third parameter rather than the second.
The output tapes can then be evaluated and post-processed to retrieve the gradient:
>>> dev = qml.device("default.qubit") >>> fn(qml.execute(gradient_tapes, dev, None)) ((Array(-0.3875172, dtype=float64), Array(-0.18884787, dtype=float64), Array(-0.38355704, dtype=float64)), (Array(0.69916862, dtype=float64), Array(0.34072424, dtype=float64), Array(0.69202359, dtype=float64)))
This gradient transform is compatible with devices that use shot vectors for execution.
shots = (10, 100, 1000) dev = qml.device("default.qubit", shots=shots) @qml.qnode(dev) def circuit(params): qml.RX(params[0], wires=0) qml.RY(params[1], wires=0) qml.RX(params[2], wires=0) return qml.expval(qml.Z(0)), qml.var(qml.Z(0))
>>> params = np.array([0.1, 0.2, 0.3], requires_grad=True) >>> qml.gradients.param_shift(circuit)(params) ((array([-0.2, -0.1, -0.4]), array([0.4, 0.2, 0.8])), (array([-0.4 , -0.24, -0.43]), array([0.672 , 0.4032, 0.7224])), (array([-0.399, -0.179, -0.387]), array([0.722988, 0.324348, 0.701244])))
The outermost tuple contains results corresponding to each element of the shot vector.
When setting the keyword argument
broadcast
toTrue
, the shifted circuit evaluations for each operation are batched together, resulting in broadcasted tapes:>>> params = np.array([0.1, 0.2, 0.3], requires_grad=True) >>> ops = [qml.RX(params[0], 0), qml.RY(params[1], 0), qml.RX(params[2], 0)] >>> measurements = [qml.expval(qml.Z(0))] >>> tape = qml.tape.QuantumTape(ops, measurements) >>> gradient_tapes, fn = qml.gradients.param_shift(tape, broadcast=True) >>> len(gradient_tapes) 3 >>> [t.batch_size for t in gradient_tapes] [2, 2, 2]
The postprocessing function will know that broadcasting is used and handle the results accordingly:
>>> fn(qml.execute(gradient_tapes, dev, None)) (tensor(-0.3875172, requires_grad=True), tensor(-0.18884787, requires_grad=True), tensor(-0.38355704, requires_grad=True))
An advantage of using
broadcast=True
is a speedup:import timeit @qml.qnode(qml.device("default.qubit")) def circuit(params): qml.RX(params[0], wires=0) qml.RY(params[1], wires=0) qml.RX(params[2], wires=0) return qml.expval(qml.Z(0))
>>> number = 100 >>> serial_call = "qml.gradients.param_shift(circuit, broadcast=False)(params)" >>> timeit.timeit(serial_call, globals=globals(), number=number) / number 0.020183045039993887 >>> broadcasted_call = "qml.gradients.param_shift(circuit, broadcast=True)(params)" >>> timeit.timeit(broadcasted_call, globals=globals(), number=number) / number 0.01244492811998498
This speedup grows with the number of shifts and qubits until all preprocessing and postprocessing overhead becomes negligible. While it will depend strongly on the details of the circuit, at least a small improvement can be expected in most cases. Note that
broadcast=True
requires additional memory by a factor of the largest batch_size of the created tapes.Shot vectors and multiple return measurements are supported with
broadcast=True
.