qml.gradients.param_shift

param_shift(tape, argnum=None, shifts=None, gradient_recipes=None, fallback_fn=<function finite_diff>, f0=None, broadcast=False, shots=None)[source]

Transform a QNode to compute the parameter-shift gradient of all gate parameters with respect to its inputs.

Parameters
  • qnode (pennylane.QNode or QuantumTape) – quantum tape or QNode to differentiate

  • argnum (int or list[int] or None) – Trainable parameter indices to differentiate with respect to. If not provided, the derivative with respect to all trainable indices are returned.

  • shifts (list[tuple[int or float]]) – List containing tuples of shift values. If provided, one tuple of shifts should be given per trainable parameter and the tuple should match the number of frequencies for that parameter. If unspecified, equidistant shifts are assumed.

  • gradient_recipes (tuple(list[list[float]] or None)) –

    List of gradient recipes for the parameter-shift method. One gradient recipe must be provided per trainable parameter.

    This is a tuple with one nested list per parameter. For parameter \(\phi_k\), the nested list contains elements of the form \([c_i, a_i, s_i]\) where \(i\) is the index of the term, resulting in a gradient recipe of

    \[\frac{\partial}{\partial\phi_k}f = \sum_{i} c_i f(a_i \phi_k + s_i).\]

    If None, the default gradient recipe containing the two terms \([c_0, a_0, s_0]=[1/2, 1, \pi/2]\) and \([c_1, a_1, s_1]=[-1/2, 1, -\pi/2]\) is assumed for every parameter.

  • fallback_fn (None or Callable) – a fallback gradient function to use for any parameters that do not support the parameter-shift rule.

  • f0 (tensor_like[float] or None) – Output of the evaluated input tape. If provided, and the gradient recipe contains an unshifted term, this value is used, saving a quantum evaluation.

  • broadcast (bool) – Whether or not to use parameter broadcasting to create the a single broadcasted tape per operation instead of one tape per shift angle.

  • shots (None, int, list[int]) – Argument used by the new return type system (see enable_return() for more information); it represents the device shots that will be used to execute the tapes outputted by this transform. Note that this argument doesn’t influence the shots used for tape execution, but provides information to the transform about the device shots and helps in determining if a shot sequence was used.

Returns

  • If the input is a QNode, a tensor representing the output Jacobian matrix of size (number_outputs, number_gate_parameters) is returned.

  • If the input is a tape, a tuple containing a list of generated tapes, in addition to a post-processing function to be applied to the evaluated tapes.

Return type

tensor_like or tuple[list[QuantumTape], function]

For a variational evolution \(U(\mathbf{p}) \vert 0\rangle\) with \(N\) parameters \(\mathbf{p}\), consider the expectation value of an observable \(O\):

\[f(\mathbf{p}) = \langle \hat{O} \rangle(\mathbf{p}) = \langle 0 \vert U(\mathbf{p})^\dagger \hat{O} U(\mathbf{p}) \vert 0\rangle.\]

The gradient of this expectation value can be calculated via the parameter-shift rule:

\[\frac{\partial f}{\partial \mathbf{p}} = \sum_{\mu=1}^{2R} f\left(\mathbf{p}+\frac{2\mu-1}{2R}\pi\right) \frac{(-1)^{\mu-1}}{4R\sin^2\left(\frac{2\mu-1}{4R}\pi\right)}\]

Here, \(R\) is the number of frequencies with which the parameter \(\mathbf{p}\) enters the function \(f\) via the operation \(U\), and we assumed that these frequencies are equidistant. For more general shift rules, both regarding the shifts and the frequencies, and for more technical details, see Vidal and Theis (2018) and Wierichs et al. (2022).

Gradients of variances

For a variational evolution \(U(\mathbf{p}) \vert 0\rangle\) with \(N\) parameters \(\mathbf{p}\), consider the variance of an observable \(O\):

\[g(\mathbf{p})=\langle \hat{O}^2 \rangle (\mathbf{p}) - [\langle \hat{O} \rangle(\mathbf{p})]^2.\]

We can relate this directly to the parameter-shift rule by noting that

\[\frac{\partial g}{\partial \mathbf{p}}= \frac{\partial}{\partial \mathbf{p}} \langle \hat{O}^2 \rangle (\mathbf{p}) - 2 f(\mathbf{p}) \frac{\partial f}{\partial \mathbf{p}}.\]

The derivatives in the expression on the right hand side can be computed via the shift rule as above, allowing for the computation of the variance derivative.

In the case where \(O\) is involutory (\(\hat{O}^2 = I\)), the first term in the above expression vanishes, and we are simply left with

\[\frac{\partial g}{\partial \mathbf{p}} = - 2 f(\mathbf{p}) \frac{\partial f}{\partial \mathbf{p}}.\]

Example

This transform can be registered directly as the quantum gradient transform to use during autodifferentiation:

>>> dev = qml.device("default.qubit", wires=2)
>>> @qml.qnode(dev, gradient_fn=qml.gradients.param_shift)
... def circuit(params):
...     qml.RX(params[0], wires=0)
...     qml.RY(params[1], wires=0)
...     qml.RX(params[2], wires=0)
...     return qml.expval(qml.PauliZ(0)), qml.var(qml.PauliZ(0))
>>> params = np.array([0.1, 0.2, 0.3], requires_grad=True)
>>> qml.jacobian(circuit)(params)
tensor([[-0.38751725, -0.18884792, -0.38355708],
        [ 0.69916868,  0.34072432,  0.69202365]], requires_grad=True)

Note

param_shift performs multiple attempts to obtain the gradient recipes for each operation:

  • If an operation has a custom grad_recipe defined, it is used.

  • If parameter_frequencies yields a result, the frequencies are used to construct the general parameter-shift rule via generate_shift_rule(). Note that by default, the generator is used to compute the parameter frequencies if they are not provided via a custom implementation.

That is, the order of precedence is grad_recipe, custom parameter_frequencies, and finally generator() via the default implementation of the frequencies.

Warning

Note that using parameter broadcasting via broadcast=True is not supported for tapes with multiple return values or for evaluations with shot vectors. As the option broadcast=True adds a broadcasting dimension, it is not compatible with circuits that already are broadcasted. Finally, operations with trainable parameters are required to support broadcasting. One way of checking this is the Attribute supports_broadcasting:

>>> qml.RX in qml.ops.qubit.attributes.supports_broadcasting
True

This gradient transform can be applied directly to QNode objects:

>>> @qml.qnode(dev)
... def circuit(params):
...     qml.RX(params[0], wires=0)
...     qml.RY(params[1], wires=0)
...     qml.RX(params[2], wires=0)
...     return qml.expval(qml.PauliZ(0)), qml.var(qml.PauliZ(0))
>>> qml.gradients.param_shift(circuit)(params)
tensor([[-0.38751725, -0.18884792, -0.38355708],
        [ 0.69916868,  0.34072432,  0.69202365]], requires_grad=True)

This quantum gradient transform can also be applied to low-level QuantumTape objects. This will result in no implicit quantum device evaluation. Instead, the processed tapes, and post-processing function, which together define the gradient are directly returned:

>>> with qml.tape.QuantumTape() as tape:
...     qml.RX(params[0], wires=0)
...     qml.RY(params[1], wires=0)
...     qml.RX(params[2], wires=0)
...     qml.expval(qml.PauliZ(0))
...     qml.var(qml.PauliZ(0))
>>> gradient_tapes, fn = qml.gradients.param_shift(tape)
>>> gradient_tapes
[<QuantumTape: wires=[0, 1], params=3>,
 <QuantumTape: wires=[0, 1], params=3>,
 <QuantumTape: wires=[0, 1], params=3>,
 <QuantumTape: wires=[0, 1], params=3>,
 <QuantumTape: wires=[0, 1], params=3>,
 <QuantumTape: wires=[0, 1], params=3>]

This can be useful if the underlying circuits representing the gradient computation need to be analyzed.

The output tapes can then be evaluated and post-processed to retrieve the gradient:

>>> dev = qml.device("default.qubit", wires=2)
>>> fn(qml.execute(gradient_tapes, dev, None))
[[-0.38751721 -0.18884787 -0.38355704]
 [ 0.69916862  0.34072424  0.69202359]]

When setting the keyword argument broadcast to True, the shifted circuit evaluations for each operation are batched together, resulting in broadcasted tapes:

>>> params = np.array([0.1, 0.2, 0.3], requires_grad=True)
>>> with qml.tape.QuantumTape() as tape:
...     qml.RX(params[0], wires=0)
...     qml.RY(params[1], wires=0)
...     qml.RX(params[2], wires=0)
...     qml.expval(qml.PauliZ(0))
>>> gradient_tapes, fn = qml.gradients.param_shift(tape, broadcast=True)
>>> len(gradient_tapes)
3
>>> [t.batch_size for t in gradient_tapes]
[2, 2, 2]

The postprocessing function will know that broadcasting is used and handle the results accordingly: >>> fn(qml.execute(gradient_tapes, dev, None)) array([[-0.3875172 , -0.18884787, -0.38355704]])

An advantage of using broadcast=True is a speedup:

>>> @qml.qnode(dev)
... def circuit(params):
...     qml.RX(params[0], wires=0)
...     qml.RY(params[1], wires=0)
...     qml.RX(params[2], wires=0)
...     return qml.expval(qml.PauliZ(0))
>>> number = 100
>>> serial_call = "qml.gradients.param_shift(circuit, broadcast=False)(params)"
>>> timeit.timeit(serial_call, globals=globals(), number=number) / number
0.020183045039993887
>>> broadcasted_call = "qml.gradients.param_shift(circuit, broadcast=True)(params)"
>>> timeit.timeit(broadcasted_call, globals=globals(), number=number) / number
0.01244492811998498

This speedup grows with the number of shifts and qubits until all preprocessing and postprocessing overhead becomes negligible. While it will depend strongly on the details of the circuit, at least a small improvement can be expected in most cases. Note that broadcast=True requires additional memory by a factor of the largest batch_size of the created tapes.