LightningGPU device¶
The lightning.gpu
device is an extension of PennyLane’s builtin lightning.qubit
device.
It extends the CPUfocused Lightning simulator to run using the NVIDIA cuQuantum SDK, enabling GPUaccelerated simulation of quantum statevector evolution.
A lightning.gpu
device can be loaded using:
import pennylane as qml
dev = qml.device("lightning.gpu", wires=2)
If the NVIDIA cuQuantum libraries are available, the above device will allow all operations to be perfomed on a CUDA capable GPU of generation SM 7.0 (Volta) and greater. If the libraries are not correctly installed, or available on path, the device will fallback to lightning.qubit
and perform all simulation on the CPU.
The lightning.gpu
device also directly supports quantum circuit gradients using the adjoint differentiation method. This can be enabled at the PennyLane QNode level with:
qml.qnode(dev, diff_method="adjoint")
def circuit(params):
...
Supported operations and observables¶
Supported operations:
Prepares a single computational basis state. 

The controlledNOT operator 

The controlledRot operator 

The controlledRX operator 

The controlledRY operator 

The controlledRZ operator 

The Hadamard operator 

The Pauli X operator 

The Pauli Y operator 

The Pauli Z operator 

Arbitrary single qubit local phase shift 

A qubit controlled phase shift. 

Prepare subsystems using the given ket vector in the computational basis. 

Arbitrary single qubit rotation 

The single qubit X rotation 

The single qubit Y rotation 

The single qubit Z rotation 

The singlequbit phase gate 

The singlequbit T gate 
Supported observables:
The Hadamard operator 

The identity observable \(\I\). 

The Pauli X operator 

The Pauli Y operator 

The Pauli Z operator 

Operator representing a Hamiltonian. 
Parallel adjoint differentiation support:
The lightning.gpu
device directly supports the adjoint differentiation method, and enables parallelization over the requested observables. This supports direct controlling of observable batching, which can be used to run concurrent calculations across multiple available GPUs.
If you are computing a large number of expectation values, or if you are using a large number of wires on your device, it may be best to evenly divide the number of expectation value calculations across all available GPUs. This will reduce the overall memory cost of the obseravbles per GPU, at the cost of additional compute time. Assuming m observables, and n GPUs, the default behaviour is to preallocate all storage for n observables on a single GPU. To divide the workload amongst many GPUs, initialize a lightning.gpu
device with the batch_obs=True
keyword argument, as:
import pennylane as qml
dev = qml.device("lightning.gpu", wires=20, batch_obs=True)
With the above, each GPU will see at most m/n observables to process, reducing the preallocated memory footprint.
Additionally, there can be situations where even with the above distribution, and limited GPU memory, the overall problem does not fit on the requested GPU devices. You can further reduce the concurrent allocations on available GPUs by providing an integer value to the batch_obs keyword. For example, to batch evaluate observables with at most 1 observable allocation per GPU, define the device as:
import pennylane as qml
dev = qml.device("lightning.gpu", wires=27, batch_obs=1)
Each problem is unique, so it can often be best to choose the default behaviour upfront, and tune with the above only if necessary.