qml.data

The data subpackage provides functionality to access, store and manipulate quantum datasets.

Note

To start using datasets, please first see the quantum datasets quickstart guide.

Overview

Datasets are generally stored and accessed using the Dataset class. Pre-computed datasets are available for download and can be accessed using the load() or load_interactive() functions. Additionally, users can easily create, write to disk, and read custom datasets using functions within the Dataset class.

attribute(val[, doc])

Creates a dataset attribute that contains both a value and associated metadata.

field([attribute_type, doc, py_type])

Used to define fields on a declarative Dataset.

Dataset([bind, data_name, identifiers])

Base class for Datasets.

DatasetNotWriteableError(bind)

Exception raised when attempting to set an attribute on a dataset whose underlying file is not writeable.

load(data_name[, attributes, folder_path, ...])

Downloads the data if it is not already present in the directory and returns it as a list of Dataset objects.

load_interactive()

Download a dataset using an interactive load prompt.

list_attributes(data_name)

List the attributes that exist for a specific data_name.

list_data_names()

Get list of dataclass IDs.

list_datasets()

Returns a dictionary of the available datasets.

In addition, various dataset types are provided

AttributeInfo([attrs_bind])

Contains metadata that may be assigned to a dataset attribute.

DatasetAttribute([value, info, bind, ...])

The DatasetAttribute class provides an interface for converting Python objects to and from a HDF5 array or Group.

DatasetArray([value, info, bind, parent_and_key])

Attribute type for objects that implement the Array protocol, including numpy arrays and pennylane.math.tensor.

DatasetScalar([value, info, bind, ...])

Attribute type for numbers.

DatasetString([value, info, bind, ...])

Attribute type for strings.

DatasetList([value, info, bind, parent_and_key])

Provides a list-like collection type for Dataset Attributes.

DatasetDict([value, info, bind, parent_and_key])

Provides a dict-like collection for Dataset attribute types.

DatasetOperator([value, info, bind, ...])

DatasetAttribute for pennylane.operation.Operator classes.

DatasetNone([value, info, bind, parent_and_key])

Datasets type for 'None' values.

DatasetMolecule([value, info, bind, ...])

Attribute type for pennylane.qchem.Molecule.

DatasetSparseArray([value, info, bind, ...])

Attribute type for Scipy sparse arrays.

DatasetJSON([value, info, bind, parent_and_key])

Dataset type for JSON-serializable data.

DatasetTuple([value, info, bind, parent_and_key])

Type for tuples.

Datasets

The Dataset class provides a portable storage format for information describing a physical system and its evolution. For example, a dataset for an arbitrary quantum system could have a Hamiltonian, its ground state, and an efficient state-preparation circuit for that state. Datasets can contain a range of object types, including:

  • numpy.ndarray

  • any numeric type

  • Molecule

  • most Operator types

  • list of any supported type

  • dict of any supported type, as long as the keys are strings

For more details on using datasets, please see the quantum datasets quickstart guide.

Creating a Dataset

To create a new dataset in-memory, initialize a new Dataset with the desired attributes:

>>> hamiltonian = qml.Hamiltonian([1., 1.], [qml.Z(0), qml.Z(1)])
>>> eigvals, eigvecs = np.linalg.eigh(qml.matrix(hamiltonian))
>>> dataset = qml.data.Dataset(
...   hamiltonian = hamiltonian,
...   eigen = {"eigvals": eigvals, "eigvecs": eigvecs}
... )
>>> dataset.hamiltonian
1.0 * Z(0) + 1.0 * Z(1)
>>> dataset.eigen
{'eigvals': array([-2.,  0.,  0.,  2.]),
'eigvecs': array([[0.+0.j, 0.+0.j, 0.+0.j, 1.+0.j],
   [0.+0.j, 1.+0.j, 0.+0.j, 0.+0.j],
   [0.+0.j, 0.+0.j, 1.+0.j, 0.+0.j],
   [1.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]])}

Attributes can also be assigned to the instance after creation:

>>> dataset.ground_state = np.transpose(eigvecs)[np.argmin(eigvals)]
>>> dataset.ground_state
array([0.+0.j, 0.+0.j, 0.+0.j, 1.+0.j])

Reading and Writing Datasets

Datasets can be saved to disk for later use. Datasets use the HDF5 format for serialization, which uses the ‘.h5’ file extension.

To save a dataset, use the Dataset.write() method:

>>> my_dataset = Dataset(...)
>>> my_dataset.write("~/datasets/my_dataset.h5")

To open a dataset from a file, use Dataset.open() class method:

>>> my_dataset = Dataset.open("~/datasets/my_dataset.h5", mode="r")

The mode argument follow the standard library convention — r for reading, w- and w for create and overwrite, and ‘a’ for editing. open() can be used to create a new dataset directly on disk:

>>> new_dataset = Dataset.open("~/datasets/new_datasets.h5", mode="w")

By default, any changes made to an opened dataset will be committed directly to the file, which will fail if the file is opened read-only. The "copy" mode can be used to load the dataset into memory and detach it from the file:

>>> my_dataset = Dataset.open("~/dataset/my_dataset/h5", mode="copy")
>>> my_dataset.new_attribute = "abc"

Important

Since opened datasets stream data from the disk, it is not possible to simultaneously access the same dataset from separately running scripts or multiple Jupyter notebooks. To get around this, either make a copy of the dataset in the disk or access the dataset using Dataset.open() with mode="copy".

Attribute Metadata

Dataset attributes can also contain additional metadata, such as docstrings. The attribute() function can be used to attach metadata on assignment or initialization.

>>> hamiltonian = qml.Hamiltonian([1., 1.], [qml.Z(0), qml.Z(1)])
>>> eigvals, eigvecs = np.linalg.eigh(qml.matrix(hamiltonian))
>>> dataset = qml.data.Dataset(hamiltonian = qml.data.attribute(
...     hamiltonian,
...     doc="The hamiltonian of the system"))
>>> dataset.eigen = qml.data.attribute(
...     {"eigvals": eigvals, "eigvecs": eigvecs},
...     doc="Eigenvalues and eigenvectors of the hamiltonian")

This metadata can then be accessed using the Dataset.attr_info() mapping:

>>> dataset.attr_info["eigen"]["doc"]
'Eigenvalues and eigenvectors of the hamiltonian'

Declarative API

When creating datasets to model a physical system, it is common to collect the same data for a system under different conditions or assumptions. For example, a collection of datasets describing a quantum oscillator, which contains the first 1000 energy levels for different masses and force constants.

The datasets declarative API allows us to create subclasses of Dataset that define the required attributes, or ‘fields’, and their associated type and documentation:

class QuantumOscillator(qml.data.Dataset, data_name="quantum_oscillator", identifiers=["mass", "force_constant"]):
    """Dataset describing a quantum oscillator."""

    mass: float = qml.data.field(doc = "The mass of the particle")
    force_constant: float = qml.data.field(doc = "The force constant of the oscillator")
    hamiltonian: qml.Hamiltonian = qml.data.field(doc = "The hamiltonian of the particle")
    energy_levels: np.ndarray = qml.data.field(doc = "The first 1000 energy levels of the system")

The data_name keyword specifies a category or descriptive name for the dataset type, and the identifiers keyword to the class is used to specify fields that function as parameters, i.e they determine the behaviour of the system.

When a QuantumOscillator dataset is created, its attributes will have the documentation from the field definition:

>>> dataset = QuantumOscillator(
...     mass=1,
...     force_constant=0.5,
...     hamiltonian=qml.X(0),
...     energy_levels=np.array([0.1, 0.2])
... )
>>> dataset.attr_info["mass"]["doc"]
'The mass of the particle'