vjp(tape, dy, gradient_fn, gradient_kwargs=None)[source]

Generate the gradient tapes and processing function required to compute the vector-Jacobian products of a tape.

Consider a function \(\mathbf{f}(\mathbf{x})\). The Jacobian is given by

\[\begin{split}\mathbf{J}_{\mathbf{f}}(\mathbf{x}) = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} &\cdots &\frac{\partial f_1}{\partial x_n}\\ \vdots &\ddots &\vdots\\ \frac{\partial f_m}{\partial x_1} &\cdots &\frac{\partial f_m}{\partial x_n}\\ \end{pmatrix}.\end{split}\]

During backpropagation, the chain rule is applied. For example, consider the cost function \(h = y\circ f: \mathbb{R}^n \rightarrow \mathbb{R}\), where \(y: \mathbb{R}^m \rightarrow \mathbb{R}\). The gradient is:

\[\nabla h(\mathbf{x}) = \frac{\partial y}{\partial \mathbf{f}} \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \frac{\partial y}{\partial \mathbf{f}} \mathbf{J}_{\mathbf{f}}(\mathbf{x}).\]

Denote \(d\mathbf{y} = \frac{\partial y}{\partial \mathbf{f}}\); we can write this in the form of a matrix multiplication:

\[\left[\nabla h(\mathbf{x})\right]_{j} = \sum_{i=0}^m d\mathbf{y}_i ~ \mathbf{J}_{ij}.\]

Thus, we can see that the gradient of the cost function is given by the so-called vector-Jacobian product; the product of the row-vector \(d\mathbf{y}\), representing the gradient of subsequent components of the cost function, and \(\mathbf{J}\), the Jacobian of the current node of interest.

  • tape (QuantumTape) – quantum tape to differentiate

  • dy (tensor_like) – Gradient-output vector. Must have shape matching the output shape of the corresponding tape.

  • gradient_fn (callable) – the gradient transform to use to differentiate the tape

  • gradient_kwargs (dict) – dictionary of keyword arguments to pass when determining the gradients of tapes


Vector-Jacobian product. Returns None if the tape has no trainable parameters.

Return type

tensor_like or None


Consider the following quantum tape with PyTorch parameters:

import torch

x = torch.tensor([[0.1, 0.2, 0.3],
                  [0.4, 0.5, 0.6]], requires_grad=True, dtype=torch.float64)

ops = [
    qml.RX(x[0, 0], wires=0),
    qml.RY(x[0, 1], wires=1),
    qml.RZ(x[0, 2], wires=0),
    qml.CNOT(wires=[0, 1]),
    qml.RX(x[1, 0], wires=1),
    qml.RY(x[1, 1], wires=0),
    qml.RZ(x[1, 2], wires=1)
measurements = [qml.expval(qml.Z(0)), qml.probs(wires=1)]
tape = qml.tape.QuantumTape(ops, measurements)

We can use the vjp function to compute the vector-Jacobian product, given a gradient-output vector dy:

>>> dy = torch.tensor([1., 1., 1.], dtype=torch.float64)
>>> vjp_tapes, fn = qml.gradients.vjp(tape, dy, qml.gradients.param_shift)

Note that dy has shape (3,), matching the output dimension of the tape (1 expectation and 2 probability values).

Executing the VJP tapes, and applying the processing function:

>>> dev = qml.device("default.qubit")
>>> vjp = fn(qml.execute(vjp_tapes, dev, gradient_fn=qml.gradients.param_shift, interface="torch"))
>>> vjp
tensor([-1.1562e-01, -1.3862e-02, -9.0841e-03, -1.5214e-16, -4.8217e-01,
         2.1329e-17], dtype=torch.float64, grad_fn=<SumBackward1>)

The output VJP is also differentiable with respect to the tape parameters:

>>> cost = torch.sum(vjp)
>>> cost.backward()
>>> x.grad
tensor([[-1.1025e+00, -2.0554e-01, -1.4917e-01],
        [-1.2490e-16, -9.1580e-01,  0.0000e+00]], dtype=torch.float64)