qml.gradients.param_shift¶
-
param_shift
(tape, argnum=None, shifts=None, gradient_recipes=None, fallback_fn=<function finite_diff>, f0=None, broadcast=False, shots=None)[source]¶ Transform a QNode to compute the parameter-shift gradient of all gate parameters with respect to its inputs.
- Parameters
qnode (pennylane.QNode or QuantumTape) – quantum tape or QNode to differentiate
argnum (int or list[int] or None) – Trainable parameter indices to differentiate with respect to. If not provided, the derivative with respect to all trainable indices are returned.
shifts (list[tuple[int or float]]) – List containing tuples of shift values. If provided, one tuple of shifts should be given per trainable parameter and the tuple should match the number of frequencies for that parameter. If unspecified, equidistant shifts are assumed.
gradient_recipes (tuple(list[list[float]] or None)) –
List of gradient recipes for the parameter-shift method. One gradient recipe must be provided per trainable parameter.
This is a tuple with one nested list per parameter. For parameter \(\phi_k\), the nested list contains elements of the form \([c_i, a_i, s_i]\) where \(i\) is the index of the term, resulting in a gradient recipe of
\[\frac{\partial}{\partial\phi_k}f = \sum_{i} c_i f(a_i \phi_k + s_i).\]If
None
, the default gradient recipe containing the two terms \([c_0, a_0, s_0]=[1/2, 1, \pi/2]\) and \([c_1, a_1, s_1]=[-1/2, 1, -\pi/2]\) is assumed for every parameter.fallback_fn (None or Callable) – a fallback gradient function to use for any parameters that do not support the parameter-shift rule.
f0 (tensor_like[float] or None) – Output of the evaluated input tape. If provided, and the gradient recipe contains an unshifted term, this value is used, saving a quantum evaluation.
broadcast (bool) – Whether or not to use parameter broadcasting to create the a single broadcasted tape per operation instead of one tape per shift angle.
shots (None, int, list[int]) – The device shots that will be used to execute the tapes outputted by this transform. Note that this argument doesn’t influence the shots used for tape execution, but provides information about the shots.
- Returns
If the input is a QNode, an object representing the Jacobian (function) of the QNode that can be executed to obtain the Jacobian. The type of the Jacobian returned is either a tensor, a tuple or a nested tuple depending on the nesting structure of the original QNode output.
If the input is a tape, a tuple containing a list of generated tapes, together with a post-processing function to be applied to the results of the evaluated tapes in order to obtain the Jacobian.
- Return type
function or tuple[list[QuantumTape], function]
For a variational evolution \(U(\mathbf{p}) \vert 0\rangle\) with \(N\) parameters \(\mathbf{p}\), consider the expectation value of an observable \(O\):
\[f(\mathbf{p}) = \langle \hat{O} \rangle(\mathbf{p}) = \langle 0 \vert U(\mathbf{p})^\dagger \hat{O} U(\mathbf{p}) \vert 0\rangle.\]The gradient of this expectation value can be calculated via the parameter-shift rule:
\[\frac{\partial f}{\partial \mathbf{p}} = \sum_{\mu=1}^{2R} f\left(\mathbf{p}+\frac{2\mu-1}{2R}\pi\right) \frac{(-1)^{\mu-1}}{4R\sin^2\left(\frac{2\mu-1}{4R}\pi\right)}\]Here, \(R\) is the number of frequencies with which the parameter \(\mathbf{p}\) enters the function \(f\) via the operation \(U\), and we assumed that these frequencies are equidistant. For more general shift rules, both regarding the shifts and the frequencies, and for more technical details, see Vidal and Theis (2018) and Wierichs et al. (2022).
Gradients of variances
For a variational evolution \(U(\mathbf{p}) \vert 0\rangle\) with \(N\) parameters \(\mathbf{p}\), consider the variance of an observable \(O\):
\[g(\mathbf{p})=\langle \hat{O}^2 \rangle (\mathbf{p}) - [\langle \hat{O} \rangle(\mathbf{p})]^2.\]We can relate this directly to the parameter-shift rule by noting that
\[\frac{\partial g}{\partial \mathbf{p}}= \frac{\partial}{\partial \mathbf{p}} \langle \hat{O}^2 \rangle (\mathbf{p}) - 2 f(\mathbf{p}) \frac{\partial f}{\partial \mathbf{p}}.\]The derivatives in the expression on the right hand side can be computed via the shift rule as above, allowing for the computation of the variance derivative.
In the case where \(O\) is involutory (\(\hat{O}^2 = I\)), the first term in the above expression vanishes, and we are simply left with
\[\frac{\partial g}{\partial \mathbf{p}} = - 2 f(\mathbf{p}) \frac{\partial f}{\partial \mathbf{p}}.\]Example
This transform can be registered directly as the quantum gradient transform to use during autodifferentiation:
>>> dev = qml.device("default.qubit", wires=2) >>> @qml.qnode(dev, interface="autograd", diff_method="parameter-shift") ... def circuit(params): ... qml.RX(params[0], wires=0) ... qml.RY(params[1], wires=0) ... qml.RX(params[2], wires=0) ... return qml.expval(qml.PauliZ(0)) >>> params = np.array([0.1, 0.2, 0.3], requires_grad=True) >>> qml.jacobian(circuit)(params) tensor([-0.38751725, -0.18884792, -0.38355708], requires_grad=True)
When differentiating QNodes with multiple measurements using Autograd or TensorFlow, the outputs of the QNode first need to be stacked. The reason is that those two frameworks only allow differentiating functions with array or tensor outputs, instead of functions that output sequences. In contrast, Jax and Torch require no additional post-processing.
>>> import jax >>> dev = qml.device("default.qubit", wires=2) >>> @qml.qnode(dev, interface="jax", diff_method="parameter-shift") ... def circuit(params): ... qml.RX(params[0], wires=0) ... qml.RY(params[1], wires=0) ... qml.RX(params[2], wires=0) ... return qml.expval(qml.PauliZ(0)), qml.var(qml.PauliZ(0)) >>> params = jax.numpy.array([0.1, 0.2, 0.3]) >>> jax.jacobian(circuit)(params) (Array([-0.38751727, -0.18884793, -0.3835571 ], dtype=float32), Array([0.6991687 , 0.34072432, 0.6920237 ], dtype=float32))
Note
param_shift
performs multiple attempts to obtain the gradient recipes for each operation:If an operation has a custom
grad_recipe
defined, it is used.If
parameter_frequencies
yields a result, the frequencies are used to construct the general parameter-shift rule viagenerate_shift_rule()
. Note that by default, the generator is used to compute the parameter frequencies if they are not provided via a custom implementation.
That is, the order of precedence is
grad_recipe
, customparameter_frequencies
, and finallygenerator()
via the default implementation of the frequencies.Warning
Note that using parameter broadcasting via
broadcast=True
is not supported for tapes with multiple return values or for evaluations with shot vectors. As the optionbroadcast=True
adds a broadcasting dimension, it is not compatible with circuits that already are broadcasted. Finally, operations with trainable parameters are required to support broadcasting. One way of checking this is the Attribute supports_broadcasting:>>> qml.RX in qml.ops.qubit.attributes.supports_broadcasting True
Usage Details
This gradient transform can be applied directly to
QNode
objects:>>> @qml.qnode(dev) ... def circuit(params): ... qml.RX(params[0], wires=0) ... qml.RY(params[1], wires=0) ... qml.RX(params[2], wires=0) ... return qml.expval(qml.PauliZ(0)), qml.var(qml.PauliZ(0)) >>> qml.gradients.param_shift(circuit)(params) ((tensor(-0.38751724, requires_grad=True), tensor(-0.18884792, requires_grad=True), tensor(-0.38355709, requires_grad=True)), (tensor(0.69916868, requires_grad=True), tensor(0.34072432, requires_grad=True), tensor(0.69202366, requires_grad=True)))
This quantum gradient transform can also be applied to low-level
QuantumTape
objects. This will result in no implicit quantum device evaluation. Instead, the processed tapes, and post-processing function, which together define the gradient are directly returned:>>> with qml.tape.QuantumTape() as tape: ... qml.RX(params[0], wires=0) ... qml.RY(params[1], wires=0) ... qml.RX(params[2], wires=0) ... qml.expval(qml.PauliZ(0)) ... qml.var(qml.PauliZ(0)) >>> gradient_tapes, fn = qml.gradients.param_shift(tape) >>> gradient_tapes [<QuantumTape: wires=[0, 1], params=3>, <QuantumTape: wires=[0, 1], params=3>, <QuantumTape: wires=[0, 1], params=3>, <QuantumTape: wires=[0, 1], params=3>, <QuantumTape: wires=[0, 1], params=3>, <QuantumTape: wires=[0, 1], params=3>]
This can be useful if the underlying circuits representing the gradient computation need to be analyzed.
The output tapes can then be evaluated and post-processed to retrieve the gradient:
>>> dev = qml.device("default.qubit", wires=2) >>> fn(qml.execute(gradient_tapes, dev, None)) ((array(-0.3875172), array(-0.18884787), array(-0.38355704)), (array(0.69916862), array(0.34072424), array(0.69202359)))
Devices that have a shot vector defined can also be used for execution, provided the
shots
argument was passed to the transform:>>> shots = (10, 100, 1000) >>> dev = qml.device("default.qubit", wires=2, shots=shots) >>> @qml.qnode(dev) ... def circuit(params): ... qml.RX(params[0], wires=0) ... qml.RY(params[1], wires=0) ... qml.RX(params[2], wires=0) ... return qml.expval(qml.PauliZ(0)), qml.var(qml.PauliZ(0)) >>> params = np.array([0.1, 0.2, 0.3], requires_grad=True) >>> qml.gradients.param_shift(circuit, shots=shots)(params) (((array(-0.6), array(-0.1), array(-0.1)), (array(1.2), array(0.2), array(0.2))), ((array(-0.39), array(-0.24), array(-0.49)), (array(0.7488), array(0.4608), array(0.9408))), ((array(-0.36), array(-0.191), array(-0.37)), (array(0.65808), array(0.349148), array(0.67636))))
The outermost tuple contains results corresponding to each element of the shot vector.
When setting the keyword argument
broadcast
toTrue
, the shifted circuit evaluations for each operation are batched together, resulting in broadcasted tapes:>>> params = np.array([0.1, 0.2, 0.3], requires_grad=True) >>> with qml.tape.QuantumTape() as tape: ... qml.RX(params[0], wires=0) ... qml.RY(params[1], wires=0) ... qml.RX(params[2], wires=0) ... qml.expval(qml.PauliZ(0)) >>> gradient_tapes, fn = qml.gradients.param_shift(tape, broadcast=True) >>> len(gradient_tapes) 3 >>> [t.batch_size for t in gradient_tapes] [2, 2, 2]
The postprocessing function will know that broadcasting is used and handle the results accordingly:
>>> fn(qml.execute(gradient_tapes, dev, None)) (array(-0.3875172), array(-0.18884787), array(-0.38355704))
An advantage of using
broadcast=True
is a speedup:>>> @qml.qnode(dev) ... def circuit(params): ... qml.RX(params[0], wires=0) ... qml.RY(params[1], wires=0) ... qml.RX(params[2], wires=0) ... return qml.expval(qml.PauliZ(0)) >>> number = 100 >>> serial_call = "qml.gradients.param_shift(circuit, broadcast=False)(params)" >>> timeit.timeit(serial_call, globals=globals(), number=number) / number 0.020183045039993887 >>> broadcasted_call = "qml.gradients.param_shift(circuit, broadcast=True)(params)" >>> timeit.timeit(broadcasted_call, globals=globals(), number=number) / number 0.01244492811998498
This speedup grows with the number of shifts and qubits until all preprocessing and postprocessing overhead becomes negligible. While it will depend strongly on the details of the circuit, at least a small improvement can be expected in most cases. Note that
broadcast=True
requires additional memory by a factor of the largest batch_size of the created tapes.