stoch_pulse_grad(tape, argnum=None, num_split_times=1, sampler_seed=None, use_broadcasting=False)[source]

Compute the gradient of a quantum circuit composed of pulse sequences by applying the stochastic parameter shift rule.

For a pulse-based cost function \(C(\boldsymbol{v}, T)\) with variational parameters \(\boldsymbol{v}\) and evolution time \(T\), it is given by (c.f. Eqn. (6) in Leng et al. (2022) with altered notation):

\[\frac{\partial C}{\partial v_k} = \int_{0}^{T} \mathrm{d}\tau \sum_{j=1}^m \frac{\partial f_j}{\partial v_k}(\boldsymbol{v}, \tau) \left[C_j^{(+)}(\boldsymbol{v}, \tau) - C_j^{(-)}(\boldsymbol{v}, \tau)\right]\]

Here, \(f_j\) are the pulse envelopes that capture the time dependence of the pulse Hamiltonian:

\[H(\boldsymbol{v}, t) = H_\text{drift} + \sum_j f_j(\boldsymbol{v}, t) H_j,\]

and \(C_j^{(\pm)}\) are modified cost functions:

\[\begin{split}C_j^{(\pm)}(\boldsymbol{v}, \tau)&= \bra{\psi^{(\pm)}_{j}(\boldsymbol{v}, \tau)} B \ket{\psi^{(\pm)}_{j}(\boldsymbol{v}, \tau)} \\ \ket{\psi^{(\pm)}_{j}(\boldsymbol{v}, \tau)} &= U_{\boldsymbol{v}}(T, \tau) e^{-i (\pm \frac{\pi}{4}) H_j} U_{\boldsymbol{v}}(\tau, 0)\ket{\psi_0}.\end{split}\]

That is, the \(j\)th modified time evolution in these circuit interrupts the evolution generated by the pulse Hamiltonian by inserting a rotation gate generated by the corresponding Hamiltonian term \(H_j\) with a rotation angle of \(\pm\frac{\pi}{4}\).

See below for a more detailed description. The integral in the first equation above is estimated numerically in the stochastic parameter-shift rule. For this, it samples split times \(\tau\) and averages the modified cost functions and the Jacobians of the envelopes \(\partial f_j / \partial v_k\) at the sampled times suitably.

  • tape (QuantumTape) – quantum circuit to differentiate

  • argnum (int or list[int] or None) – Trainable tape parameter indices to differentiate with respect to. If not provided, the derivatives with respect to all trainable parameters are returned.

  • num_split_times (int) – number of time samples to use in the stochastic parameter-shift rule underlying the differentiation; also see details

  • sample_seed (int) – randomness seed to be used for the time samples in the stochastic parameter-shift rule

  • use_broadcasting (bool) – Whether to use broadcasting across the different sampled splitting times. If False (the default), one set of modified tapes per splitting time is created, if True only a single set of broadcasted, modified tapes is created, increasing performance on simulators.


The transformed circuit as described in qml.transform. Executing this circuit will provide the Jacobian in the form of a tensor, a tuple, or a nested tuple depending upon the nesting structure of measurements in the original circuit.

Return type

tuple[List[QuantumTape], function]

This transform realizes the stochastic parameter-shift rule for pulse sequences, as introduced in Banchi and Crooks (2018) and Leng et al. (2022).


This function requires the JAX interface and does not work with other autodiff interfaces commonly encountered with PennyLane. Finally, this transform is not JIT-compatible yet.


This function uses a basic sampling approach with a uniform distribution to estimate the integral appearing in the stochastic parameter-shift rule. In many cases, there are probability distributions that lead to smaller variances of the estimator. In addition, the sampling approach will not reduce trivially to simpler parameter-shift rules when used with simple pulses (see details and examples below), potentially leading to imprecise results and/or unnecessarily large computational efforts.


This transform may not be applied directly to QNodes. Use JAX entrypoints (jax.grad, jax.jacobian, …) instead or apply the transform on the tape level. Also see the examples below.


Consider a pulse program with a single two-qubit pulse, generated by a Hamiltonian with three terms: the non-trainable term \(\frac{1}{2}X_0\), the trainable constant (over time) term \(v_1 Z_0 Z_1\) and the trainable sinoidal term \(\sin(v_2 t) (\frac{1}{5} Y_0 + \frac{7}{10} X_1)\).

jax.config.update("jax_enable_x64", True)

dev = qml.device("default.qubit.jax", wires=2)

def sin(p, t):
    return jax.numpy.sin(p * t)

ZZ = qml.PauliZ(0) @ qml.PauliZ(1)
Y_plus_X = qml.dot([1/5, 3/5], [qml.PauliY(0), qml.PauliX(1)])
H = 0.5 * qml.PauliX(0) + qml.pulse.constant * ZZ + sin * Y_plus_X

def ansatz(params):
    qml.evolve(H)(params, (0.2, 0.4))
    return qml.expval(qml.PauliY(1))

qnode = qml.QNode(ansatz, dev, interface="jax", diff_method=qml.gradients.stoch_pulse_grad)

The program takes the two parameters \(v_1, v_2\) for the two trainable terms:

>>> params = [jax.numpy.array(0.4), jax.numpy.array(1.3)]
>>> qnode(params)
Array(-0.0905377, dtype=float64)

And as we registered the differentiation method stoch_pulse_grad(), we can compute its gradient in a hardware compatible manner:

>>> jax.grad(qnode)(params)
[Array(0.00109782, dtype=float64, weak_type=True),
 Array(-0.05833371, dtype=float64, weak_type=True)] # results may differ

Note that the derivative is computed using a stochastic parameter-shift rule, which is based on a sampled approximation of an integral expression (see theoretical background below). This makes the computed derivative an approximate quantity subject to statistical fluctuations with notable variance. The number of samples used to approximate the integral can be chosen with num_split_times, the seed for the sampling can be fixed with sampler_seed:

qnode = qml.QNode(
    num_split_times=5, # Use 5 samples for the approximation
    sampler_seed=18, # Fix randomness seed
>>> jax.grad(qnode)(params)
[Array(0.00207256, dtype=float64, weak_type=True),
 Array(-0.05989856, dtype=float64, weak_type=True)]

We may activate the option use_broadcasting to improve the performance when running on classical simulators. Internally, it reuses intermediate results of the time evolution. We can compare the performance with a simple test:

from time import process_time
faster_grad_qnode = qml.QNode(
    num_split_times=5, # Use 5 samples for the approximation
    sampler_seed=18, # Fix randomness seed
    use_broadcasting=True, # Activate broadcasting
times = []
for node in [qnode, faster_grad_qnode]:
    start = process_time()
    times.append(process_time() - start)
>>> print(times) # Show the gradient computation times in seconds.
[55.75785480000002, 12.297400500000009]


As the option use_broadcasting=True adds a broadcasting dimension to the modified circuits, it is not compatible with circuits that already are broadcasted.

Consider a pulse generated by a time-dependent Hamiltonian

\[H(\boldsymbol{v}, t) = H_\text{drift} + \sum_j f_j(v_j, t) H_j,\]

where \(\boldsymbol{v}=\{v_j\}\) are variational parameters and \(t\) is the time. In addition, consider a cost function that is based on using this pulse for a duration \(T\) in a pulse sequence and measuring the expectation value of an observable. For simplicity we absorb the parts of the sequence before and after the considered pulse into the initial state and the observable, respectively:

\[C(\boldsymbol{v}, t) = \bra{\psi_0} U_{\boldsymbol{v}}(T, 0)^\dagger B U_{\boldsymbol{v}}(T, 0)\ket{\psi_0}.\]

Here, we denoted the unitary evolution under \(H(\boldsymbol{v}, t)\) from time \(t_1\) to \(t_2\) as \(U_{\boldsymbol{v}(t_2, t_1)}\). Then the derivative of \(C\) with respect to a specific parameter \(v_k\) is given by (see Eqn. (6) of Leng et al. (2022))

\[\frac{\partial C}{\partial v_k} = \int_{0}^{T} \mathrm{d}\tau \sum_{j=1}^m \frac{\partial f_j}{\partial v_k}(\boldsymbol{v}, \tau) \widetilde{C_j}(\boldsymbol{v}, \tau).\]

Here, the integral ranges over the duration of the pulse, the partial derivatives of the coefficient functions, \(\partial f_j / \partial v_k\), are computed classically, and \(\widetilde{C_j}\) is a linear combination of the results from modified pulse sequence executions based on generalized parameter-shift rules (see e.g. Kyriienko and Elfving (2022) or Wierichs et al. (2022) for more details and param_shift() for an implementation of the non-stochastic generalized shift rules) Given the parameter shift rule with coefficients \(\{y_\ell\}\) and shifts \(\{x_\ell\}\) for the single-parameter pulse \(\exp(-i \theta H_j)\), the linear combination is given by

\[\begin{split}\widetilde{C_j}(\boldsymbol{v}, \tau)&=\sum_{\ell=1} y_\ell \bra{\psi_{j}(\boldsymbol{v}, x_\ell, \tau)} B \ket{\psi_{j}(\boldsymbol{v}, x_\ell, \tau)} \\ \ket{\psi_{j}(\boldsymbol{v}, x_\ell, \tau)} &= U_{\boldsymbol{v}}(T, \tau) e^{-i x_\ell H_j} U_{\boldsymbol{v}}(\tau, 0)\ket{\psi_0}.\end{split}\]

In practice, the time integral over \(\tau\) is computed by sampling values for the time, evaluating the integrand, and averaging appropriately. The probability distribution used for the sampling may have a significant impact on the quality of the obtained estimates, in particular with regards to their variance. In this function, a uniform distribution over the interval \([0, t]\) is used, which often can be improved upon.


Consider the pulse generated by

\[H(\boldsymbol{v}, t) = \frac{1}{2} X_0 + v_1 Z_0 Z_1 + \sin(v_2 t) X_1\]

and the observable \(B=Y_1\). There are two variational parameters, \(v_1\) and \(v_2\), for which we may compute the derivative of the cost function:

\[\begin{split}\frac{\partial C}{\partial v_1} &= \int_{0}^{T} \mathrm{d}\tau \ \widetilde{C_1}((v_1, v_2), \tau)\\ \frac{\partial C}{\partial v_2} &= \int_{0}^{T} \mathrm{d}\tau \cos(v_2 \tau) \tau \ \widetilde{C_2}((v_1, v_2), \tau)\\ \widetilde{C_j}((v_1, v_2), \tau)&= \bra{\psi_{j}((v_1, v_2), \pi/4, \tau)} B \ket{\psi_{j}((v_1, v_2), \pi/4, \tau)}\\ &-\bra{\psi_{j}((v_1, v_2), -\pi/4, \tau)} B \ket{\psi_{j}((v_1, v_2), -\pi/4, \tau)} \\ \ket{\psi_{j}((v_1, v_2), x, \tau)} &= U_{(v_1, v_2)}(T, \tau) e^{-i x H_j}U_{(v_1, v_2)}(\tau, 0)\ket{0}.\end{split}\]

Here we used the partial derivatives

\[\begin{split}\frac{\partial f_1}{\partial v_1}&= 1\\ \frac{\partial f_2}{\partial v_2}&= \cos(v_2 t) t \\ \frac{\partial f_1}{\partial v_2}= \frac{\partial f_2}{\partial v_1}&= 0\end{split}\]

and the fact that both \(H_1=Z_0 Z_1\) and \(H_2=X_1\) have two unique eigenvalues and therefore admit a two-term parameter-shift rule (see e.g. Schuld et al. (2018)).

As a second scenario, consider the single-qubit pulse generated by

\[H((v_1, v_2), t) = v_1 \sin(v_2 t) X\]

together with the observable \(B=Z\). You may already notice that this pulse can be rewritten as a RX rotation, because we have a single Hamiltonian term and the spectrum of \(H\) consequently will be constant up to rescaling. In particular, the unitary time evolution under the Schrödinger equation is given by

\[\begin{split}U_{(v_1, v_2)}(t_2, t_1) &= \exp\left(-i\int_{t_1}^{t_2} \mathrm{d}\tau v_1 \sin(v_2 \tau) X\right)\\ &=\exp(-i\theta(v_1, v_2) X)\\ \theta(v_1, v_2) &= \int_{t_1}^{t_2} \mathrm{d}\tau v_1 \sin(v_2 \tau)\\ &=-\frac{v_1}{v_2}(\cos(v_2 t_2) - \cos(v_2 t_1)).\end{split}\]

As the RX rotation satisfies a (non-stochastic) two-term parameter-shift rule, we could compute the derivatives with respect to \(v_1\) and \(v_2\) by implementing \(\exp(-i\theta(v_1, v_2) X)\), applying the two-term shift rule and evaluating the classical Jacobian of the mapping \(\theta(v_1, v_2)\).

Using the stochastic parameter-shift rule instead will lead to approximation errors. This is because the approximated integral not only includes the shifted circuit evaluations, which do not depend on \(\tau\) in this example, but also on the classical Jacobian, which is not constant over \(\tau\). Therefore, it is important to implement pulses in the simplest way possible.