stoch_pulse_grad(tape, argnum=None, num_split_times=1, sampler_seed=None, shots=None, use_broadcasting=False)

Compute the gradient of a quantum circuit composed of pulse sequences by applying the stochastic parameter shift rule.

For a pulse-based cost function $$C(\boldsymbol{v}, T)$$ with variational parameters $$\boldsymbol{v}$$ and evolution time $$T$$, it is given by (c.f. Eqn. (6) in Leng et al. (2022) with altered notation):

$\frac{\partial C}{\partial v_k} = \int_{0}^{T} \mathrm{d}\tau \sum_{j=1}^m \frac{\partial f_j}{\partial v_k}(\boldsymbol{v}, \tau) \left[C_j^{(+)}(\boldsymbol{v}, \tau) - C_j^{(-)}(\boldsymbol{v}, \tau)\right]$

Here, $$f_j$$ are the pulse envelopes that capture the time dependence of the pulse Hamiltonian:

$H(\boldsymbol{v}, t) = H_\text{drift} + \sum_j f_j(\boldsymbol{v}, t) H_j,$

and $$C_j^{(\pm)}$$ are modified cost functions:

$\begin{split}C_j^{(\pm)}(\boldsymbol{v}, \tau)&= \bra{\psi^{(\pm)}_{j}(\boldsymbol{v}, \tau)} B \ket{\psi^{(\pm)}_{j}(\boldsymbol{v}, \tau)} \\ \ket{\psi^{(\pm)}_{j}(\boldsymbol{v}, \tau)} &= U_{\boldsymbol{v}}(T, \tau) e^{-i \pm \frac{\pi}{2} H_j} U_{\boldsymbol{v}}(\tau, 0)\ket{\psi_0}.\end{split}$

That is, the $$j$$th modified time evolution in these circuit interrupts the evolution generated by the pulse Hamiltonian by inserting a rotation gate generated by the corresponding Hamiltonian term $$H_j$$ with a rotation angle of $$\pm\frac{\pi}{2}$$.

See below for a more detailed description. The integral in the first equation above is estimated numerically in the stochastic parameter-shift rule. For this, it samples split times $$\tau$$ and averages the modified cost functions and the Jacobians of the envelopes $$\partial f_j / \partial v_k$$ at the sampled times suitably.

Parameters
• tape (pennylane.QNode or QuantumTape) – quantum tape or QNode to differentiate

• argnum (int or list[int] or None) – Trainable tape parameter indices to differentiate with respect to. If not provided, the derivatives with respect to all trainable parameters are returned.

• num_split_times (int) – number of time samples to use in the stochastic parameter-shift rule underlying the differentiation; also see details

• sample_seed (int) – randomness seed to be used for the time samples in the stochastic parameter-shift rule

• shots (None, int, list[int]) – The shots of the device used to execute the tapes which are returned by this transform. Note that this argument does not influence the shots used for execution, but informs the transform about the shots to ensure a compatible return value structure.

• use_broadcasting (bool) – Whether to use broadcasting across the different sampled splitting times. If False (the default), one set of modified tapes per splitting time is created, if True only a single set of broadcasted, modified tapes is created, increasing performance on simulators.

Returns

• If the input is a QNode, an object representing the Jacobian (function) of the QNode that can be executed to obtain the Jacobian. The type of the Jacobian returned is either a tensor, a tuple or a nested tuple depending on the nesting structure of the original QNode output.

• If the input is a tape, a tuple containing a list of generated tapes, together with a post-processing function to be applied to the results of the evaluated tapes in order to obtain the Jacobian.

Return type

function or tuple[list[QuantumTape], function]

This transform realizes the stochastic parameter-shift rule for pulse sequences, as introduced in Banchi and Crooks (2018) and Leng et al. (2022).

Note

This function requires the JAX interface and does not work with other autodiff interfaces commonly encountered with PennyLane. Finally, this transform is not JIT-compatible yet.

Note

This function uses a basic sampling approach with a uniform distribution to estimate the integral appearing in the stochastic parameter-shift rule. In many cases, there are probability distributions that lead to smaller variances of the estimator. In addition, the sampling approach will not reduce trivially to simpler parameter-shift rules when used with simple pulses (see details and examples below), potentially leading to imprecise results and/or unnecessarily large computational efforts.

Note

Currently this function only supports pulses for which each parametrized term is a simple Pauli word. More general Hamiltonian terms are not supported yet.

Examples

Consider a pulse program with a single two-qubit pulse, generated by a Hamiltonian with three terms: the non-trainable term $$\frac{1}{2}X\otimes I$$, the trainable constant (over time) term $$v_1 Z\otimes Z$$ and the trainable sinoidal term $$\sin(v_2 t) I\otimes X$$.

from pennylane.gradients import stoch_pulse_grad

dev = qml.device("default.qubit.jax", wires=2)

def sin(p, t):
return jax.numpy.sin(p * t)

ZZ = qml.PauliZ(0) @ qml.PauliZ(1)
H = 0.5 * qml.PauliX(0) + qml.pulse.constant * ZZ + sin * qml.PauliX(1)

def ansatz(params):
qml.evolve(H)(params, (0.2, 0.4))
return qml.expval(qml.PauliY(1))



The program takes the two parameters $$v_1, v_2$$ for the two trainable terms:

>>> params = [jax.numpy.array(0.4), jax.numpy.array(1.3)]
>>> qnode(params)
Array(-0.15052551, dtype=float32)


And as we registered the differentiation method stoch_pulse_grad(), we can compute its gradient in a hardware compatible manner:

>>> jax.grad(qnode)(params)
[Array(0.00277838, dtype=float32, weak_type=True),
Array(-0.07787319, dtype=float32, weak_type=True)] # results may differ


Note that the derivative is computed using a stochastic parameter-shift rule, which is based on a sampled approximation of an integral expression (see theoretical background below). This makes the computed derivative an approximate quantity subject to statistical fluctuations with notable variance. The number of samples used to approximate the integral can be chosen with num_split_times, the seed for the sampling can be fixed with sampler_seed:

qnode = qml.QNode(
ansatz,
dev,
interface="jax",
num_split_times=5, # Use 5 samples for the approximation
sampler_seed=18, # Fix randomness seed
)

>>> jax.grad(qnode)(params)
[Array(0.00246266, dtype=float32, weak_type=True),
Array(-0.11399216, dtype=float32, weak_type=True)]


On simulator devices, we may activate the option use_broadcasting, which makes use of broadcasting internally to improve the performance of the stochastic parameter-shift rule:

from time import process_time
ansatz,
dev,
interface="jax",
num_split_times=5, # Use 5 samples for the approximation
sampler_seed=18, # Fix randomness seed
)
times = []
start = process_time()
times.append(process_time() - start)

>>> print(times) # Show the gradient computation times in seconds.
[11.582010403000002, 3.9708045299999988]


Warning

As the option use_broadcasting=True adds a broadcasting dimension to the modified circuits, it is not compatible with circuits that already are broadcasted.

Consider a pulse generated by a time-dependent Hamiltonian

$H(\boldsymbol{v}, t) = H_\text{drift} + \sum_j f_j(v_j, t) H_j,$

where $$\boldsymbol{v}=\{v_j\}$$ are variational parameters and $$t$$ is the time. In addition, consider a cost function that is based on using this pulse for a duration $$T$$ in a pulse sequence and measuring the expectation value of an observable. For simplicity we absorb the parts of the sequence before and after the considered pulse into the initial state and the observable, respectively:

$C(\boldsymbol{v}, t) = \bra{\psi_0} U_{\boldsymbol{v}}(T, 0)^\dagger B U_{\boldsymbol{v}}(T, 0)\ket{\psi_0}.$

Here, we denoted the unitary evolution under $$H(\boldsymbol{v}, t)$$ from time $$t_1$$ to $$t_2$$ as $$U_{\boldsymbol{v}(t_2, t_1)}$$. Then the derivative of $$C$$ with respect to a specific parameter $$v_k$$ is given by (see Eqn. (6) of Leng et al. (2022))

$\frac{\partial C}{\partial v_k} = \int_{0}^{T} \mathrm{d}\tau \sum_{j=1}^m \frac{\partial f_j}{\partial v_k}(\boldsymbol{v}, \tau) \widetilde{C_j}(\boldsymbol{v}, \tau).$

Here, the integral ranges over the duration of the pulse, the partial derivatives of the coefficient functions, $$\partial f_j / \partial v_k$$, are computed classically, and $$\widetilde{C_j}$$ is a linear combination of the results from modified pulse sequence executions based on generalized parameter-shift rules (see e.g. Kyriienko and Elfving (2022) or Wierichs et al. (2022) for more details and param_shift() for an implementation of the non-stochastic generalized shift rules) Given the parameter shift rule with coefficients $$\{y_\ell\}$$ and shifts $$\{x_\ell\}$$ for the single-parameter pulse $$\exp(-i \theta H_j)$$, the linear combination is given by

$\begin{split}\widetilde{C_j}(\boldsymbol{v}, \tau)&=\sum_{\ell=1} y_\ell \bra{\psi_{j}(\boldsymbol{v}, x_\ell, \tau)} B \ket{\psi_{j}(\boldsymbol{v}, x_\ell, \tau)} \\ \ket{\psi_{j}(\boldsymbol{v}, x_\ell, \tau)} &= U_{\boldsymbol{v}}(T, \tau) e^{-i x_\ell H_j} U_{\boldsymbol{v}}(\tau, 0)\ket{\psi_0}.\end{split}$

In practice, the time integral over $$\tau$$ is computed by sampling values for the time, evaluating the integrand, and averaging appropriately. The probability distribution used for the sampling may have a significant impact on the quality of the obtained estimates, in particular with regards to their variance. In this function, a uniform distribution over the interval $$[0, t]$$ is used, which often can be improved upon.

Examples

Consider the pulse generated by

$H(\boldsymbol{v}, t) = \frac{1}{2} X\otimes I + v_1 Z\otimes Z + \sin(v_2 t) I\otimes X$

and the observable $$B=I\otimes Y$$. There are two variational parameters, $$v_1$$ and $$v_2$$, for which we may compute the derivative of the cost function:

$\begin{split}\frac{\partial C}{\partial v_1} &= \int_{0}^{T} \mathrm{d}\tau \ \widetilde{C_1}((v_1, v_2), \tau)\\ \frac{\partial C}{\partial v_2} &= \int_{0}^{T} \mathrm{d}\tau \cos(v_2 \tau) \tau \ \widetilde{C_2}((v_1, v_2), \tau)\\ \widetilde{C_j}((v_1, v_2), \tau)&= \bra{\psi_{j}((v_1, v_2), \pi/4, \tau)} B \ket{\psi_{j}((v_1, v_2), \pi/4, \tau)}\\ &-\bra{\psi_{j}((v_1, v_2), -\pi/4, \tau)} B \ket{\psi_{j}((v_1, v_2), -\pi/4, \tau)} \\ \ket{\psi_{j}((v_1, v_2), x, \tau)} &= U_{(v_1, v_2)}(T, \tau) e^{-i x H_j}U_{(v_1, v_2)}(\tau, 0)\ket{0}.\end{split}$

Here we used the partial derivatives

$\begin{split}\frac{\partial f_1}{\partial v_1}&= 1\\ \frac{\partial f_2}{\partial v_2}&= \cos(v_2 t) t \\ \frac{\partial f_1}{\partial v_2}= \frac{\partial f_2}{\partial v_1}&= 0\end{split}$

and the fact that both $$H_1=Z\otimes Z$$ and $$H_2=I\otimes X$$ have two unique eigenvalues and therefore admit a two-term parameter-shift rule (see e.g. Schuld et al. (2018)).

As a second scenario, consider the single-qubit pulse generated by

$H((v_1, v_2), t) = v_1 \sin(v_2 t) X$

together with the observable $$B=Z$$. You may already notice that this pulse can be rewritten as a RX rotation, because we have a single Hamiltonian term and the spectrum of $$H$$ consequently will be constant up to rescaling. In particular, the unitary time evolution under the Schrödinger equation is given by

$\begin{split}U_{(v_1, v_2)}(t_2, t_1) &= \exp\left(-i\int_{t_1}^{t_2} \mathrm{d}\tau v_1 \sin(v_2 \tau) X\right)\\ &=\exp(-i\theta(v_1, v_2) X)\\ \theta(v_1, v_2) &= \int_{t_1}^{t_2} \mathrm{d}\tau v_1 \sin(v_2 \tau)\\ &=-\frac{v_1}{v_2}(\cos(v_2 t_2) - \cos(v_2 t_1)).\end{split}$

As the RX rotation satisfies a (non-stochastic) two-term parameter-shift rule, we could compute the derivatives with respect to $$v_1$$ and $$v_2$$ by implementing $$\exp(-i\theta(v_1, v_2) X)$$, applying the two-term shift rule and evaluating the classical Jacobian of the mapping $$\theta(v_1, v_2)$$.

Using the stochastic parameter-shift rule instead will lead to approximation errors. This is because the approximated integral not only includes the shifted circuit evaluations, which do not depend on $$\tau$$ in this example, but also on the classical Jacobian, which is not constant over $$\tau$$. Therefore, it is important to implement pulses in the simplest way possible.