Gradients and training¶

PennyLane offers seamless integration between classical and quantum computations. Code up quantum circuits in PennyLane, compute gradients of quantum circuits, and connect them easily to the top scientific computing and machine learning libraries.

Training and interfaces¶

The bridge between the quantum and classical worlds is provided in PennyLane via interfaces to automatic differentiation libraries. Currently, three libraries are supported: NumPy, PyTorch, and JAX. PennyLane makes each of these libraries quantum-aware, allowing quantum circuits to be treated just like any other operation. Any automatic differentiation framework can be chosen with any device.

In PennyLane, an automatic differentiation framework is declared using the interface argument when creating a QNode, e.g.,

@qp.qnode(dev, interface="torch")
def my_quantum_circuit(...):
    ...

Note

If no interface is specified, PennyLane will automatically determine the interface based on provided arguments and keyword arguments. See qp.math.SUPPORTED_INTERFACE_NAMES for a list of all accepted interface strings.

Warning

ComplexWarning messages may appear when running differentiable workflows involving both complex and float types, particularly with certain interfaces. These warnings are common in backpropagation due to the nature of complex casting and do not indicate an error in computation. If desired, you can suppress these warnings by adding the following code:

import warnings
warnings.filterwarnings("ignore", category=np.ComplexWarning)

This will allow native numerical objects of the specified library (NumPy arrays, JAX arrays and Torch Tensors) to be passed as parameters to the quantum circuit. It also makes the gradients of the quantum circuit accessible to the classical library, enabling the optimization of arbitrary hybrid circuits by making use of the library’s native optimizers.

When specifying an interface, the objects of the chosen framework are converted into NumPy objects and are passed to a device in most cases. Exceptions include cases when the devices support end-to-end computations in a framework. Such devices may be referred to as backpropagation or passthru devices.

See the links below for walkthroughs of each specific interface:

In addition to the core automatic differentiation frameworks discussed above, PennyLane also provides higher-level classes for converting QNodes into torch.nn layers:

pennylane.qnn.TorchLayer(*args, **kwargs)

Converts a QNode to a Torch layer.

Warning

PennyLane’s QNodes currently promote all torch.float32 (single-precision) inputs to torch.float64 (double-precision) during execution. This may result in higher memory usage than expected.

Note

QNodes that allow for automatic differentiation will always incur a small overhead on evaluation. If you do not need to compute quantum gradients of a QNode, specifying interface=None will remove this overhead and result in a slightly faster evaluation. However, gradients will no longer be available.

Optimizers¶

Optimizers are objects which can be used to automatically update the parameters of a quantum or hybrid machine learning model. The optimizers you should use are dependent on your choice of the classical autodifferentiation library, and are available from different access points.

NumPy¶

When using the standard NumPy framework, PennyLane offers some built-in optimizers. Some of these are specific to quantum optimization, such as the QNGOptimizer, RiemannianGradientOptimizer, RotosolveOptimizer, RotoselectOptimizer, ShotAdaptiveOptimizer, and QNSPSAOptimizer.

`AdagradOptimizer`	Gradient-descent optimizer with past-gradient-dependent learning rate in each dimension.
`AdamOptimizer`	Gradient-descent optimizer with adaptive learning rate, first and second moment.
`AdaptiveOptimizer`	Optimizer for building fully trained quantum circuits by adding gates adaptively.
`GradientDescentOptimizer`	Basic gradient-descent optimizer.
`MomentumOptimizer`	Gradient-descent optimizer with momentum.
`NesterovMomentumOptimizer`	Gradient-descent optimizer with Nesterov momentum.
`QNGOptimizer`	Optimizer with adaptive learning rate, via calculation of the diagonal or block-diagonal approximation to the Fubini-Study metric tensor.
`RiemannianGradientOptimizer`	Riemannian gradient optimizer.
`RMSPropOptimizer`	Root mean squared propagation optimizer.
`RotosolveOptimizer`	Rotosolve gradient-free optimizer.
`RotoselectOptimizer`	Rotoselect gradient-free optimizer.
`ShotAdaptiveOptimizer`	Optimizer where the shot rate is adaptively calculated using the variances of the parameter-shift gradient.
`SPSAOptimizer`	The Simultaneous Perturbation Stochastic Approximation method (SPSA) is a stochastic approximation algorithm for optimizing cost functions whose evaluation may involve noise.
`QNSPSAOptimizer`	Quantum natural SPSA (QNSPSA) optimizer.

PyTorch¶

If you are using the PennyLane PyTorch framework, you should import one of the native PyTorch optimizers (found in torch.optim).

TensorFlow¶

Warning

As of PennyLane v0.44, TensorFlow support is no longer maintained. We recommend using the JAX interface or PyTorch interface for machine learning applications to benefit from enhanced support and features. Please consult the following demos for a comprehensive guide on JAX and PyTorch: Turning quantum nodes into Torch Layers and How to optimize a QML model using JAX and Optax.

When using the PennyLane TensorFlow framework, you will need to leverage one of the TensorFlow optimizers (found in tf.keras.optimizers).

JAX¶

Check out the JAXopt and the Optax packages to find optimizers for the PennyLane JAX framework.

For quantum-specific optimizers (not available in Optax), PennyLane offers a built-in Jax-based implementation compatible with jax.jit and qp.qjit.

`QNGOptimizerQJIT`	Optax-like and `jax.jit`/`qp.qjit`-compatible implementation of the `QNGOptimizer`, a step- and parameter-dependent learning rate optimizer, leveraging a reparameterization of the optimization space based on the Fubini-Study metric tensor.
`MomentumQNGOptimizerQJIT`	Optax-like and `jax.jit`/`qp.qjit`-compatible implementation of the `MomentumQNGOptimizer`, a generalized Quantum Natural Gradient (QNG) optimizer considering a discrete-time Langevin equation with QNG force.

Gradients¶

The interface between PennyLane and automatic differentiation libraries relies on PennyLane’s ability to compute or estimate gradients of quantum circuits. There are different strategies to do so, and they may depend on the device used.

When creating a QNode, you can specify the differentiation method like this:

@qp.qnode(dev, diff_method="parameter-shift")
def circuit(x):
    qp.RX(x, wires=0)
    return qp.probs(wires=0)

PennyLane currently provides the following differentiation methods for QNodes:

Simulation-based differentiation¶

The following methods use reverse accumulation to compute gradients; a well-known example of this approach is backpropagation. These methods are not hardware compatible; they are only supported on statevector simulator devices such as default.qubit.

However, for rapid prototyping on simulators, these methods typically out-perform forward-mode accumulators such as the parameter-shift rule and finite-differences. For more details, see the quantum backpropagation demonstration.

"backprop": Use standard backpropagation.

This differentiation method is only allowed on simulator devices that are classically end-to-end differentiable, for example default.qubit. This method does not work on devices that estimate measurement statistics using a finite number of shots; please use the parameter-shift rule instead.
"adjoint": Use a form of backpropagation that takes advantage of the unitary or reversible nature of quantum computation.

The adjoint method reverses through the circuit after a forward pass by iteratively applying the inverse (adjoint) gate. This method is similar to "backprop", but has significantly lower memory usage and a similar runtime.

Hardware-compatible differentiation¶

The following methods support both quantum hardware and simulators, and are examples of forward accumulation. However, when using a simulator, you may notice that the number of circuit executions required to compute the gradients with these methods scales linearly with the number of trainable circuit parameters.

"parameter-shift": Use the analytic parameter-shift rule for all supported quantum operation arguments, with finite-difference as a fallback.
"finite-diff": Use numerical finite-differences for all quantum operation arguments.
"hadamard": Use hadamard tests on the generators for all compatible quantum operations arguments.
"reversed-hadamard": Use a variant of "hadamard", where the role of the observable being measured and the generators of the unitary operations in the circuit are reversed; the generators are now the observables, and the Pauli decomposition of the observables are now gates in the circuit. More information can be found in the documentation for qp.gradients.hadamard_grad and the original paper outlining this gradient method.
"direct-hadamard": Use a variant of "hadamard", where the additional auxiliary qubit needed in "hadamard" is exchanged for additional circuit executions. More information can be found in the documentation for qp.gradients.hadamard_grad and the original paper outlining this gradient method.
"reversed-direct-hadamard": Use a variant of "direct-hadamard", where the role of the observable and the generators of the unitary operations in the circuit swap in a similar fashion to the "reversed-hadamard" method. More information can be found in the documentation for qp.gradients.hadamard_grad and the original paper outlining this gradient method.
qp.gradients.stoch_pulse_grad: Use a stochastic variant of the parameter-shift rule for pulse programs.
qp.gradients.pulse_odegen: Combine classical processing with the parameter-shift rule for multivariate gates to differentiate pulse programs.

Device gradients¶

"device": Queries the device directly for the gradient. Only allowed on devices that provide their own gradient computation.

Note

If not specified, the default differentiation method is diff_method="best". PennyLane will attempt to determine the best differentiation method given the device and interface. Typically, PennyLane will prioritize device-provided gradients, backpropagation, parameter-shift rule, and finally finite differences, in that order.

Gradient transforms¶

In addition to registering the differentiation method of QNodes to be used with autodifferentiation frameworks, PennyLane also provides a library of gradient transforms via the qp.gradients module.

Quantum gradient transforms are strategies for computing the gradient of a quantum circuit that work by transforming the quantum circuit into one or more gradient circuits. They accompany these circuits with a function that post-processes their output. These gradient circuits, once executed and post-processed, return the gradient of the original circuit.

Examples of quantum gradient transforms include finite-difference rules and parameter-shift rules; these can be applied directly to QNodes:

dev = qp.device("default.qubit", wires=2)

@qp.qnode(dev)
def circuit(weights):
    qp.RX(weights[0], wires=0)
    qp.RY(weights[1], wires=1)
    qp.CNOT(wires=[0, 1])
    qp.RX(weights[2], wires=1)
    return qp.probs(wires=1)

>>> weights = np.array([0.1, 0.2, 0.3], requires_grad=True)
>>> circuit(weights)
tensor([0.9658079, 0.0341921], requires_grad=True)
>>> qp.gradients.param_shift(circuit)(weights)
(tensor([-0.04673668,  0.04673668], requires_grad=True),
 tensor([-0.09442394,  0.09442394], requires_grad=True),
 tensor([-0.14409127,  0.14409127], requires_grad=True))

Note that, while gradient transforms allow quantum gradient rules to be applied directly to QNodes, this is not a replacement — and should not be used instead of — standard training workflows (for example, qp.grad() if using Autograd, or loss.backward() for PyTorch). This is because gradient transforms do not take into account classical computation nodes, and only support gradients of QNodes. For more details on available gradient transforms, as well as learning how to define your own gradient transform, please see the qp.gradients documentation.

Differentiating gradient transforms and higher-order derivatives¶

Gradient transforms are themselves differentiable, allowing higher-order gradients to be computed:

dev = qp.device("default.qubit", wires=2)

@qp.qnode(dev)
def circuit(weights):
    qp.RX(weights[0], wires=0)
    qp.RY(weights[1], wires=1)
    qp.CNOT(wires=[0, 1])
    qp.RX(weights[2], wires=1)
    return qp.expval(qp.PauliZ(1))

>>> weights = np.array([0.1, 0.2, 0.3], requires_grad=True)
>>> circuit(weights)
tensor(0.9316158, requires_grad=True)
>>> qp.gradients.param_shift(circuit)(weights)  # gradient
(tensor(-0.09347337, requires_grad=True),
 tensor(-0.18884787, requires_grad=True),
 tensor(-0.28818254, requires_grad=True))
>>> def f(weights):
...     return np.stack(qp.gradients.param_shift(circuit)(weights))
>>> qp.jacobian(f)(weights)  # hessian
array([[[-0.9316158 ,  0.01894799,  0.0289147 ],
        [ 0.01894799, -0.9316158 ,  0.05841749],
        [ 0.0289147 ,  0.05841749, -0.9316158 ]]])

Another way to compute higher-order derivatives is by passing the max_diff and diff_method arguments to the QNode and by successive differentiation:

@qp.qnode(dev, diff_method="parameter-shift", max_diff=2)
def circuit(weights):
    qp.RX(weights[0], wires=0)
    qp.RY(weights[1], wires=1)
    qp.CNOT(wires=[0, 1])
    qp.RX(weights[2], wires=1)
    return qp.expval(qp.PauliZ(1))

>>> weights = np.array([0.1, 0.2, 0.3], requires_grad=True)
>>> qp.jacobian(qp.jacobian(circuit))(weights)  # hessian
array([[-0.9316158 ,  0.01894799,  0.0289147 ],
       [ 0.01894799, -0.9316158 ,  0.05841749],
       [ 0.0289147 ,  0.05841749, -0.9316158 ]])

Note that the max_diff argument only applies to gradient transforms and that its default value is 1; failing to set its value correctly may yield incorrect results for higher-order derivatives. Also, passing diff_method="parameter-shift" is equivalent to passing diff_method=qp.gradients.param_shift.

Supported configurations¶

The table below show all the currently supported functionality for the "default.qubit" device. At the moment, it takes into account the following parameters:

The interface, e.g. "jax"
The differentiation method, e.g. "parameter-shift"
The return value of the QNode, e.g. qp.expval() or qp.probs()
The number of shots, either None or an integer > 0

		Return type
Interface	Differentiation method	state	density matrix	probs	sample	expval (obs)	expval (herm)	expval (proj)	var	vn entropy	mutual info
`None`	`"device"`	1	1	1	9	1	1	1	1	1	1
	`"backprop"`	1	1	1	9	1	1	1	1	1	1
	`"adjoint"`	2	2	2	9	2	2	2	2	2	2
	`"parameter-shift"`	2	2	2	9	2	2	2	2	2	2
	`"finite-diff"`	2	2	2	9	2	2	2	2	2	2
	`"spsa"`	2	2	2	9	2	2	2	2	2	2
	`"hadamard"`	2	2	2	9	2	2	2	2	2	2
	`"reversed-hadamard"`	2	2	2	9	2	2	2	2	2	2
	`"direct-hadamard"`	2	2	2	9	2	2	2	2	2	2
	`"reversed-direct-hadamard"`	2	2	2	9	2	2	2	2	2	2
`"autograd"`	`"device"`	3	3	3	9	3	3	3	3	3	3
	`"backprop"`	4	4	5	9	5	5	5	5	5	5
	`"adjoint"`	7	7	7	9	7	7	7	7	7	7
	`"parameter-shift"`	10	10	8	9	8	8	8	8	10	10
	`"finite-diff"`	10	10	8	9	8	8	8	8	8	8
	`"spsa"`	10	10	8	9	8	8	8	8	8	8
	`"hadamard"`	10	10	8	9	8	8	8	11	10	10
	`"reversed-hadamard"`	10	10	2	9	8	8	8	11	10	10
	`"direct-hadamard"`	10	10	2	9	8	8	8	11	10	10
	`"reversed-direct-hadamard"`	10	10	2	9	8	8	8	11	10	10
`"jax"`	`"device"`	3	3	3	9	3	3	3	3	3	3
	`"backprop"`	5	5	5	9	5	5	5	5	5	5
	`"adjoint"`	7	7	7	9	7	7	7	7	7	7
	`"parameter-shift"`	10	10	8	9	8	8	8	8	10	10
	`"finite-diff"`	10	10	8	9	8	8	8	8	8	8
	`"spsa"`	10	10	8	9	8	8	8	8	8	8
	`"hadamard"`	10	10	8	9	8	8	8	11	10	10
	`"reversed-hadamard"`	10	10	2	9	8	8	8	11	10	10
	`"direct-hadamard"`	10	10	2	9	8	8	8	11	10	10
	`"reversed-direct-hadamard"`	10	10	2	9	8	8	8	11	10	10
`"torch"`	`"device"`	3	3	3	9	3	3	3	3	3	3
	`"backprop"`	5	5	5	9	5	5	5	5	5	5
	`"adjoint"`	7	7	7	9	7	7	7	7	7	7
	`"parameter-shift"`	10	10	8	9	8	8	8	8	10	10
	`"finite-diff"`	10	10	8	9	8	8	8	8	8	8
	`"spsa"`	10	10	8	9	8	8	8	8	8	8
	`"hadamard"`	10	10	8	9	8	8	8	11	10	10
	`"reversed-hadamard"`	10	10	2	9	8	8	8	11	10	10
	`"direct-hadamard"`	10	10	2	9	8	8	8	11	10	10
	`"reversed-direct-hadamard"`	10	10	2	9	8	8	8	11	10	10

Not supported. Gradients are not computed even though diff_method is provided. Fails with error.
Not supported. Gradients are not computed even though diff_method is provided. Warns that no auto-differentiation framework is being used, but does not fail. Forward pass is still supported.
Not supported. The default.qubit device does not provide a native way to compute gradients. See Device jacobian for details.
Supported, but only when shots=None. See Backpropagation for details.

If the circuit returns a state, then the circuit itself is not differentiable directly. However, any real scalar-valued post-processing done to the output of the circuit will be differentiable. See State gradients for details.
Supported, but only when shots=None. See Backpropagation for details.
Not supported. The adjoint differentiation algorithm is only implemented for analytic simulation. See Adjoint differentation for details.
Supported. Raises error when shots>0 since the gradient is always computed analytically. See Adjoint differentation for details.
Supported.
Not supported. The discretization of the output caused by wave function collapse is not differentiable. The forward pass is still supported. See Sample gradients for details.
Not supported. “We just don’t have the theory yet.”
Not implemented.

Gradients and training¶

Training and interfaces¶

Optimizers¶

NumPy¶

PyTorch¶

TensorFlow¶

JAX¶

Gradients¶

Simulation-based differentiation¶

Hardware-compatible differentiation¶

Device gradients¶

Gradient transforms¶

Differentiating gradient transforms and higher-order derivatives¶

Supported configurations¶

Contents