qml.workflow.interfaces.autograd

This module contains functions for adding the Autograd interface to a PennyLane Device class.

How to bind a custom derivative with autograd.

Suppose I have a function f that I want to change how autograd takes the derivative of.

I need to:

  1. Mark it as an autograd primitive with @autograd.extend.primitive

  2. Register its VJP with autograd.extend.defvjp

@autograd.extend.primitive
def f(x, exponent=2):
    return x**exponent

def vjp(ans, x, exponent=2):
    def grad_fn(dy):
        print(f"Calculating the gradient with {x}, {dy}")
        return dy * exponent * x**(exponent-1)
    return grad_fn

autograd.extend.defvjp(f, vjp, argnums=[0])
>>> autograd.grad(f)(autograd.numpy.array(2.0))
Calculating the gradient with 2.0, 1.0
4.0

The above code told autograd how to differentiate the first argument of f.

We have an additional problem that autograd does not understand that a QuantumTape contains parameters we want to differentiate. So in order to match the vjp function with the correct parameters, we need to extract them from the batch of tapes, and pass them as is as the first argument to the primitive. Even though the primitive function does not use the parameters, that is how we communicate to autograd what parameters the derivatives belong to.

Jacobian Calculations and the need for caching:

Suppose we use the above function with an array and take the jacobian:

>>> x = autograd.numpy.array([1.0, 2.0])
>>> autograd.jacobian(f)(x)
Calculating the gradient with [1. 2.], [1. 0.]
Calculating the gradient with [1. 2.], [0. 1.]
array([[2., 0.],
       [0., 4.]])

Here, the grad_fn was called once for each output quantity. Each time grad_fn is called, we are forced to reproduce the calculation for exponent * x ** (exponent-1), only to multiply it by a different vector. When executing quantum circuits, that quantity can potentially be quite expensive. Autograd would naively request independent vjps for each entry in the output, even though the internal circuits will be exactly the same.

When caching is enabled, the expensive part (re-executing identical circuits) is avoided, but when normal caching is turned off, the above can lead to an explosion in the number of required circuit executions.

To avoid this explosion in the number of executed circuits when caching is turned off, we will instead internally cache the full jacobian so that is is reused between different calls to the same grad_fn. This behaviour is toggled by the cache_full_jacobian keyword argument to TransformJacobianProducts.

Other interfaces are capable of calculating the full jacobian in one call, so this patch is only present for autograd.

Functions

autograd_execute(tapes, execute_fn, jpc[, ...])

Execute a batch of tapes with Autograd parameters on a device.

vjp(ans, parameters, tapes, execute_fn, jpc)

Returns the vector-Jacobian product operator for a batch of quantum tapes.