Release notes¶

This page contains the release notes for Catalyst.

Release 0.11.0 (development release)¶

New features since last release

Add loop boundary optimization pass that identifies and optimizes redundant quantum operations that occur at loop iteration boundaries, where operations at iteration boundaries often cancel each other out. (#1476)

This optimization help to eliminates redundant operations that aims to reduce quantum circuit depth and gate count.This pass is supported into cancel_inverses and merge_rotations.

For example,
```
dev = qml.device("lightning.qubit", wires=2)

@qml.qjit
@catalyst.passes.cancel_inverses
@qml.qnode(dev)
def circuit():
    for i in range(3):
        qml.Hadamard(0)
        qml.CNOT([0, 1])
        qml.Hadamard(0)
    return qml.expval(qml.Z(0))
```
Note that this optimization specifically targets operations that are exact inverses of each other when applied in sequence. For example, consecutive Hadamard gates (H†H = I) pairs will be identified and eliminated.

Conversion Clifford+T gates to Pauli Product Rotation (PPR) and measurement to Pauli Product Measurement (PPM) are now available through the to_ppr pass transform.

(#1499) (#1551) (#1564)

Supported gate conversions:

H gate → PPR with (Z · X · Z)π/4
S gate → PPR with (Z)π/4
T gate → PPR with (Z)π/8

CNOT → PPR with (Z ⊗ X)π/4 · (Z ⊗ 1)−π/4 · (1 ⊗ X)−π/4

Example:

@qjit(keep_intermediate=True)
@to_ppr
@qml.qnode(dev)
def circuit():
    qml.H(0)
    qml.S(1)
    qml.T(0)
    qml.CNOT([0, 1])
    m1 = catalyst.measure(wires=0)
    m2 = catalyst.measure(wires=1)
    return m1, m2
circuit()

The PPRs and PPMs are currently only represented symbolically. However, these operations are not yet executable on any backend since they exist purely as intermediate representations for analysis and potential future execution when a suitable backend is available.

Example MLIR Representation:

. . .
  %0 = quantum.alloc( 2) : !quantum.reg
  %1 = quantum.extract %0[ 1] : !quantum.reg -> !quantum.bit
  %2 = qec.ppr ["Z"](4) %1 : !quantum.bit
  %3 = quantum.extract %0[ 0] : !quantum.reg -> !quantum.bit
  %4 = qec.ppr ["Z"](4) %3 : !quantum.bit
  %5 = qec.ppr ["X"](4) %4 : !quantum.bit
  %6 = qec.ppr ["Z"](4) %5 : !quantum.bit
  %7 = qec.ppr ["Z"](8) %6 : !quantum.bit
  %8:2 = qec.ppr ["Z", "X"](4) %7, %2 : !quantum.bit, !quantum.bit
  %9 = qec.ppr ["Z"](-4) %8#0 : !quantum.bit
  %10 = qec.ppr ["X"](-4) %8#1 : !quantum.bit
  %mres, %out_qubits = qec.ppm ["Z"] %9 : !quantum.bit
  %mres_0, %out_qubits_1 = qec.ppm ["Z"] %10 : !quantum.bit
. . .

Commuting Clifford Pauli Product Rotation (PPR) operations to the end of a circuit, past non-Clifford PPRs, is now available through the commute_ppr() pass transform. (#1563)

A PPR is a rotation gate of the form $\exp{iP \theta}$ , where $P$ is a Pauli word (a product of Pauli operators). Clifford PPRs refer to PPRs with $\theta = \tfrac{\pi}{4}$ , while non-Clifford PPRs have $\theta = \tfrac{\pi}{8}$ .

Example:

     @qjit(keep_intermediate=True)
     @pipeline({"to_ppr": {}, "commute_ppr": {}})
     @qml.qnode(qml.device("null.qubit", wires=1))
     def circuit():
         qml.H(0)
         qml.T(0)
         return measure(0)

The circuit program that generated from this pass is currrently not executable on any backend. For more information regarding to PPM, please refer to `(Pauli Product Measurement) <https://pennylane.ai/compilation/pauli-product-measurement>`_

Absorbing Clifford Pauli Product Rotation (PPR) operations into the final Pauli Product Measurement (PPM) is not availble through the ppr_to_ppm() pass transform. The output from this pass consists of non-Clifford PPRs and PPMs. (#1577)

Example:
```
@qjit(keep_intermediate=True)
@pipeline({"to_ppr": {}, "commute_ppr": {}, "ppr_to_ppm": {}})
@qml.qnode(qml.device("null.qubit", wires=1))
def circuit():
    qml.H(0)
    qml.T(0)
    return measure(0)
```

Improvements 🛠

Changed pattern rewritting in quantum-to-ion lowering pass to use MLIR’s dialect conversion infrastracture. (#1442)
Extend merge-rotations peephole optimization pass to also merge compatible rotation gates (either both controlled, or both uncontrolled) where rotation angles are any combination of static constants or dynamic values. (#1489)

Catalyst now supports experimental capture of cond, for_loop and while_loop control flow. (#1468) (#1509) (#1521)

To trigger the PennyLane pipeline for capturing the program as a Jaxpr, simply set experimental_capture=True in the qjit decorator.

import pennylane as qml
from catalyst import qjit

dev = qml.device("lightning.qubit", wires=1)

@qjit(experimental_capture=True)
@qml.qnode(dev)
def circuit(x: float):

    def ansatz_true():
        qml.RX(x, wires=0)
        qml.Hadamard(wires=0)

    def ansatz_false():
        qml.RY(x, wires=0)

    qml.cond(x > 1.4, ansatz_true, ansatz_false)()

    return qml.expval(qml.Z(0))

Catalyst now supports experimental capture of PennyLane transforms. (#1544) (#1561) (#1567) (#1578)

To trigger the PennyLane pipeline for capturing the mentioned transforms, simply set experimental_capture=True in the qjit decorator. If available, Catalyst will apply its own pass in replacement of the original transform provided by PennyLane. Otherwise, the transform will be expanded according to PennyLane rules.

import pennylane as qml
from catalyst import qjit

dev = qml.device("lightning.qubit", wires=1)

@qjit(experimental_capture=True)
def func(x: float):
    @qml.transforms.cancel_inverses
    @qml.qnode(dev)
    def circuit(x: float):
        qml.RX(x, wires=0)
        qml.Hadamard(wires=0)
        qml.Hadamard(wires=0)
        return qml.expval(qml.PauliZ(0))

    return circuit(x)

Changes to reduce compile time:
- Turn off MLIR’s verifier. (#1513)
- Remove unnecessary I/O. (#1514)
- Sort improvements to reduce complexity and memory. (#1524)
- Lazy IR canonicalization and LLVMIR textual generation. (#1530)
Catalyst now decomposes non-differentiable gates when in a gradient method. (#1562) (#1568) (#1569)

Gates that are constant, such as when all parameters are Python or NumPy data types, are not decomposed when this is allowable. For the adjoint differentiation method, this is allowable for the StatePrep, BasisState, and QubitUnitary operations. For the parameter-shift method, this is allowable for all operations.
Changes to support a dynamic number of qubits:
- The qalloc_p custom JAX primitive can now take in a dynamic number of qubits as a tracer and lower it to mlir. (#1549)
- ComputationalBasisOp can now take in a quantum register in mlir, instead of an explicit, fixed-size list of qubits. (#1553)
- Non-observable measurements without explicit wires will now compile to ComputationalBasisOp with a quantum register, instead of the explicit list of all qubits on the device. This means the same compiled IR can be reused even if the device changes its number of qubits across runs. This includes probs(), state(), sample(), counts(). (#1565)

Breaking changes 💔

Deprecations 👋

Bug fixes 🐛

Fixed argnums parameter of grad and value_and_grad being ignored. (#1478)
Fixed an issue ((#1488)) where Catalyst could give incorrect results for circuits containing qml.StatePrep. (#1491)
Fixes an issue ((#1501)) where using autograph in conjunction with catalyst passes causes a crash. (#1541)
Fixes an issue ((#1548)) where using autograph in conjunction with catalyst pipeline causes a crash. (#1576)
Fixes an issue ((#1547)) where using chained catalyst passe decorators causes a crash. (#1576)

Internal changes ⚙️

Updated the call signature for the PLXPR qnode_prim primitive. (#1538)
Update deprecated access to QNode.execute_kwargs["mcm_config"]. Instead postselect_mode and mcm_method should be accessed instead. (#1452)
from_plxpr now uses the qml.capture.PlxprInterpreter class for reduced code duplication. (#1398)
Improve the error message for invalid measurement in adjoin() or ctrl() region. (#1425)
Replace ValueRange with ResultRange and Value with OpResult to better align with the semantics of **QubitResult() functions like getNonCtrlQubitResults(). This change ensures clearer intent and usage. Improve the matchAndRewrite function by using replaceAllUsesWith instead of for loop. (#1426)
Several changes for experimental support of trapped-ion OQD devices have been made, including:
- The get_c_interface method has been added to the OQD device, which enables retrieval of the C++ implementation of the device from Python. This allows qjit to accept an instance of the device and connect to its runtime. (#1420)
- Improved ion dialect to reduce redundant code generated. Added a string attribute label to Level. Also changed the levels of a transition from LevelAttr to string (#1471)
- The region of a ParallelProtocolOp is now always terminated with a ion::YieldOp with explicitly yielded SSA values. This ensures the op is well-formed, and improves readability. (#1475)
- Add a new pass convert-ion-to-llvm which lowers the Ion dialect to llvm dialect. This pass introduces oqd device specific stubs that will be implemented in oqd runtime including: @ __catalyst__oqd__pulse, @ __catalyst__oqd__ParallelProtocol. (#1466)
- The OQD device can now generate OpenAPL JSON specs during runtime. The oqd stubs @ __catalyst__oqd__pulse, and @ __catalyst__oqd__ParallelProtocol, which are called in the llvm dialect after the aforementioned lowering ((#1466)), are defined to produce JSON specs that OpenAPL expects. (#1516)
- The OQD device is moved from frontend/catalyst/third_party/oqd to runtime/lib/backend/oqd. An overall switch, ENABLE_OQD, is added to control the OQD build system from a single entry point. The switch is OFF by default, and OQD can be built from source via make all ENABLE_OQD=ON, or make runtime ENABLE_OQD=ON. (#1508)
- Ion dialect now supports phonon modes using ion.modes operation. (#1517)
- Rotation angles are normalized to avoid negative duration for pulses during ion dialect lowering. (#1517)
- Catalyst now generates OpenAPL programs for Pennylane circuits of up to two qubits using the OQD device. (#1517)
- The end-to-end compilation pipeline for OQD devices is available as an API function. (#1545)
Update source code to comply with changes requested by black v25.1.0 (#1490)
Revert StaticCustomOp in favour of adding helper functions (isStatic(), getStaticParams() to the CustomOp which preserves the same functionality. More specifically, this reverts [#1387] and [#1396], modifies [#1484]. (#1558) (#1555)
Updated the c++ standard in mlir layer from 17 to 20. (#1229)

Documentation 📝

Contributors ✍️

This release contains contributions from (in alphabetical order):

Joey Carter, Yushao Chen, Zach Goldthorpe, Sengthai Heng, David Ittah, Rohan Nolan Lasrado, Christina Lee, Mehrdad Malekmohammadi, Erick Ochoa Lopez, Andrija Paurevic, Raul Torres, Paul Haochen Wang.

Release 0.10.0 (current release)¶

New features since last release

Catalyst can now load and apply local MLIR plugins from the PennyLane frontend. (#1287) (#1317) (#1361) (#1370)

Custom compilation passes and dialects in MLIR can be specified for use in Catalyst via a shared object (*.so or *.dylib on macOS) that implements the pass. Details on creating your own plugin can be found in our compiler plugin documentation. At a high level, there are three ways to use a plugin once it’s properly specified:
- apply_pass() can be used on QNodes when there is a Python entry point defined for the plugin. In that case, the plugin and pass should both be specified and separated by a period.
```
@catalyst.passes.apply_pass("plugin_name.pass_name")
@qml.qnode(qml.device("lightning.qubit", wires=1))
def qnode():
    return qml.state()

@qml.qjit
def module():
    return qnode()
```
- apply_pass_plugin() can be used on QNodes when the plugin did not define an entry point. In that case the full filesystem path must be specified in addition to the pass name.
```
from pathlib import Path

@catalyst.passes.apply_pass_plugin(Path("path_to_plugin"), "pass_name")
@qml.qnode(qml.device("lightning.qubit", wires=1))
def qnode():
    return qml.state()

@qml.qjit
def module():
    return qnode()
```
- Alternatively, one or more dialect and pass plugins can be specified in advance in the qjit() decorator, via the pass_plugins and dialect_plugins keyword arguments. The apply_pass() function can then be used without specifying the plugin.
```
from pathlib import Path

plugin = Path("shared_object_file.so")

@catalyst.passes.apply_pass("pass_name")
@qml.qnode(qml.device("lightning.qubit", wires=0))
def qnode():
  qml.Hadamard(wires=0)
  return qml.state()

@qml.qjit(pass_plugins=[plugin], dialect_plugins=[plugin])
def module():
  return qnode()
```
For more information on usage, visit our compiler plugin documentation.

Improvements 🛠

The Catalyst CLI, a command line interface for debugging and dissecting different stages of compilation, is now available under the catalyst command after installing Catalyst with pip. Even though the tool was first introduced in v0.9, it was not yet included in binary distributions of Catalyst (wheels). The full usage instructions are available in the Catalyst CLI documentation. (#1285) (#1368) (#1405)
Lightning devices now support finite-shot expectation values of qml.Hermitian when used with Catalyst. (#451)
The PennyLane state preparation template qml.CosineWindow is now compatible with Catalyst. (#1166)
A development distribution of Python with dynamic linking support (libpython.so) is no longer needed in order to use compile_executable() to generate standalone executables of compiled programs. (#1305)
In Catalyst v0.9 the output of the compiler instrumentation (instrumentation()) had inadvertently been made more verbose by printing timing information for each run of each pass. This change has been reverted. Instead, the qjit() option verbose=True will now instruct the instrumentation to produce this more detailed output. (#1343)
Two additional circuit optimizations have been added to Catalyst: disentangle-CNOT and disentangle-SWAP. The optimizations are available via the passes module. (#1154) (#1407)

The optimizations use a finite state machine to propagate limited qubit state information through the circuit to turn CNOT and SWAP gates into cheaper instructions. The pass is based on the work by J. Liu, L. Bello, and H. Zhou, Relaxed Peephole Optimization: A Novel Compiler Optimization for Quantum Circuits, 2020, arXiv:2012.07711.

Breaking changes 💔

The minimum supported PennyLane version has been updated to v0.40; backwards compatibility in either direction is not maintained. (#1308)
(Device Developers Only) The way the shots parameter is initialized in C++ device backends is changing. (#1310)

The previous method of including the shot number in the kwargs argument of the device constructor is deprecated and will be removed in the next release (v0.11). Instead, the shots value will be specified exclusively via the existing SetDeviceShots function called at the beginning of a quantum execution. Device developers are encouraged to update their device implementations between this and the next release while both methods are supported.

Similarly, the Sample and Counts functions (and their Partial* equivalents) will no longer provide a shots argument, since they are redundant. The signature of these functions will update in the next release.
(Device Developers Only) The toml-based device schemas have been integrated with PennyLane and updated to a new version schema = 3. (#1275)

Devices with existing TOML schema = 2 will not be compatible with the current release of Catalyst until updated. A summary of the most importation changes is listed here:
- operators.gates.native renamed to operators.gates
- operators.gates.decomp and operators.gates.matrix are removed and no longer necessary
- condition property is renamed to conditions
- Entries in the measurement_processes section now expect the full PennyLane class name as opposed to the deprecated mp.return_type shorthand (e.g. ExpectationMP instead of Expval).
- The mid_circuit_measurements field has been replaced with supported_mcm_methods, which expects a list of mcm methods that the device is able to work with (or empty if unsupported).
- A new field has been added, overlapping_observables, which indicates whether a device supports multiple measurements during one execution on overlapping wires.
- The options section has been removed. Instead, the Python device class should define a device_kwargs field holding the name and values of C++ device constructor kwargs.
See the Custom Devices page for the most up-to-date information on integrating your device with Catalyst and PennyLane.

Bug fixes 🐛

Fixed a bug introduced in Catalyst v0.8 that breaks nested invocations of qml.adjoint and qml.ctrl (e.g. qml.adjoint(qml.adjoint(qml.H(0)))). (#1301)
Fixed a bug in compile_executable() when using non-64bit arrays as input to the compiled function, due to incorrectly computed stride information. (#1338)
Fixed a bug in catalyst cli where using checkpoint-stage would cause save-ir-after-each to not work properly. (#1405)

Internal changes ⚙️

Starting with Python 3.12, Catalyst’s binary distributions (wheels) will now follow Python’s Stable ABI, eliminating the need for a separate wheel per minor Python version. To enable this, the following changes have made:
- Stable ABI wheels are now generated for Python 3.12 and up. (#1357) (#1385)
- Pybind11 has been replaced with nanobind for C++/Python bindings across all components. (#1173) (#1293) (#1391) (#624)
  
  Nanobind has been developed as a natural successor to the pybind11 library and offers a number of advantages like its ability to target Python’s Stable ABI.
- Python C-API calls have been replaced with functions from Python’s Limited API. (#1354)
- The QuantumExtension module for MLIR Python bindings, which relies on pybind11, has been removed. The module was never included in the distributed wheels and could not be converted to nanobind easily due to its dependency on upstream MLIR code. Pybind11 does not support the Python Stable ABI. (#1187)
Catalyst no longer depends on or pins the scipy package. Instead, OpenBLAS is sourced directly from scipy-openblas32 or Accelerate is used. (#1322) (#1328)
The Catalyst plugin for the lightning.qubit device has been migrated from the Catalyst repo to the Lightning repository. This reduces the size of Catalyst’s binary distributions and the build time of the project, by avoiding re-compilation of the lightning source code. (#1227) (#1307) (#1312)
The AutoGraph exception mechanism (allowlist parameter) has been streamlined to only be used in places where it’s required. (#1332) (#1337)
Each QNode now has its own transformation schedule. Instead of relying on the name of the QNode, each QNode now has a transformation module, which denotes the transformation schedule, embedded in its MLIR representation. (#1323)
The apply_registered_pass_p primitive has been removed and the API for scheduling passes to run using the transform dialect has been refactored. In particular, passes are appended to a tuple as they are being registered and they will be run in order. If there are no local passes, the global pass_pipeline is scheduled. Furthermore, this commit also reworks the caching mechanism for primitives, which is important as qnodes and functions are primitives and now that we can apply passes to them, they are distinct based on which passes have been scheduled to run on them. (#1317)
The Catalyst infrastructure has been upgraded to support a dynamic shots parameter for quantum execution. Previously, this value had to be a static compile-time constant, and could not be changed once the program was compiled. Upcoming UI changes will make the feature accessible to users. (#1360)
Several changes for experimental support of trapped-ion OQD devices have been made, including:
- An experimental ion dialect has been added for Catalyst programs targeting OQD trapped-ion quantum devices. (#1260) (#1372)
  
  The ion dialect defines the set of physical properties of the device, such as the ion species and their atomic energy levels, as well as the operations to manipulate the qubits in the trapped-ion system, such as laser pulse durations, polarizations, detuning frequencies, etc.
  
  A new pass, --quantum-to-ion, has also been added to convert logical gate-based circuits in the Catalyst quantum dialect to laser pulse operations in the ion dialect. This pass accepts logical quantum gates from the set {RX, RY, MS}, where MS is the Mølmer–Sørensen gate. Doing so enables the insertion of physical device parameters into the IR, which will be necessary when lowering to OQD’s backend calls. The physical parameters, which are typically obtained from hardware-calibration runs, are read in from TOML files during the --quantum-to-ion conversion. The TOML filepaths are taken in as pass options.
- A plugin and device backend for OQD trapped-ion quantum devices has been added. (#1355) (#1403)
- An MLIR transformation has been added to decompose {T, S, Z, Hadamard, RZ, PhaseShift, CNOT} gates into the set {RX, RY, MS}. (#1226)
Support for OQD devices is still under development, therefore OQD modules are currently not included in binary distributions (wheels) of Catalyst.
The Catalyst IR has been extended to support literal values as opposed to SSA Values for static parameters of quantum gates by adding a new gate called StaticCustomOp, with eventual lowering to the regular CustomOp operation. (#1387) (#1396)
Code readability in the catalyst.pipelines module has been improved, in particular for pipelines with conditionally included passes. (#1194)

Documentation 📝

A new tutorial going through how to write a new MLIR pass is available. The tutorial writes an empty pass that prints hello world. The code for the tutorial is located in a separate github branch. (#872)
The verbose parameter of qjit() was incorrectly listed as verbosity in the API documentation. This is now fixed. (#1440)
Added more details to catalyst-cli documentation specifying available options for checkpoint-stage and default pipelines (#1405)

Contributors ✍️

This release contains contributions from (in alphabetical order):

Astral Cai, Joey Carter, David Ittah, Erick Ochoa Lopez, Mehrdad Malekmohammadi, William Maxwell, Romain Moyard, Shuli Shu, Ritu Thombre, Raul Torres, Paul Haochen Wang.

Release 0.9.0¶

New features

Catalyst now supports the specification of shot-vectors when used with qml.sample measurements on the lightning.qubit device. (#1051)

Shot-vectors allow shots to be specified as a list of shots, [20, 1, 100], or as a tuple of the form ((num_shots, repetitions), ...) such that ((20, 3), (1, 100)) is equivalent to shots=[20, 20, 20, 1, 1, ..., 1].

This can result in more efficient quantum execution, as a single job representing the total number of shots is executed on the quantum device, with the measurement post-processing then coarse-grained with respect to the shot-vector.

For example,
```
dev = qml.device("lightning.qubit", wires=1, shots=((5, 2), 7))

@qjit
@qml.qnode(dev)
def circuit():
    qml.Hadamard(0)
    return qml.sample()
```
```
>>> circuit()
(Array([[0], [1], [0], [1], [1]], dtype=int64),
Array([[0], [1], [1], [0], [1]], dtype=int64),
Array([[1], [0], [1], [1], [0], [1], [0]], dtype=int64))
```
Note that other measurement types, such as expval and probs, currently do not support shot-vectors.

A new function catalyst.pipeline allows the quantum-circuit-transformation pass pipeline for QNodes within a qjit-compiled workflow to be configured. (#1131) (#1240)

import pennylane as qml
from catalyst import pipeline, qjit

my_passes = {
    "cancel_inverses": {},
    "my_circuit_transformation_pass": {"my-option" : "my-option-value"},
}

dev = qml.device("lightning.qubit", wires=2)

@pipeline(my_passes)
@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    return qml.expval(qml.PauliZ(0))

@qjit
def fn(x):
    return jnp.sin(circuit(x ** 2))

pipeline can also be used to specify different pass pipelines for different parts of the same qjit-compiled workflow:

my_pipeline = {
    "cancel_inverses": {},
    "my_circuit_transformation_pass": {"my-option" : "my-option-value"},
}

my_other_pipeline = {"cancel_inverses": {}}

@qjit
def fn(x):
    circuit_pipeline = pipeline(my_pipeline)(circuit)
    circuit_other = pipeline(my_other_pipeline)(circuit)
    return jnp.abs(circuit_pipeline(x) - circuit_other(x))

The pass pipeline order and options can be configured globally for a qjit-compiled function, by using the circuit_transform_pipeline argument of the qjit() decorator.

my_passes = {
    "cancel_inverses": {},
    "my_circuit_transformation_pass": {"my-option" : "my-option-value"},
}

@qjit(circuit_transform_pipeline=my_passes)
def fn(x):
    return jnp.sin(circuit(x ** 2))

Global and local (via @pipeline) configurations can coexist, however local pass pipelines will always take precedence over global pass pipelines.

The available MLIR passes are listed and documented in the passes module documentation.

A peephole merge rotations pass, which acts similarly to the Python-based PennyLane merge rotations transform, is now available in MLIR and can be applied to QNodes within a qjit-compiled function. (#1162) (#1205) (#1206)

The merge_rotations pass can be provided to the catalyst.pipeline decorator:

from catalyst import pipeline, qjit

my_passes = {
    "merge_rotations": {}
}

dev = qml.device("lightning.qubit", wires=1)

@qjit
@pipeline(my_passes)
@qml.qnode(dev)
def g(x: float):
    qml.RX(x, wires=0)
    qml.RX(x, wires=0)
    qml.Hadamard(wires=0)
    return qml.expval(qml.PauliX(0))

It can also be applied directly to qjit-compiled QNodes via the catalyst.passes.merge_rotations Python decorator:

from catalyst.passes import merge_rotations

@qjit
@merge_rotations
@qml.qnode(dev)
def g(x: float):
    qml.RX(x, wires=0)
    qml.RX(x, wires=0)
    qml.Hadamard(wires=0)
    return qml.expval(qml.PauliX(0))

Static arguments of a qjit-compiled function can now be indicated by name via a static_argnames argument to the qjit decorator. (#1158)

Specified static argument names will be treated as compile-time static values, allowing any hashable Python object to be passed to this function argument during compilation.
```
>>> @qjit(static_argnames="y")
... def f(x, y):
...     print(f"Compiling with y={y}")
...     return x + y
>>> f(0.5, 0.3)
Compiling with y=0.3
```
The function will only be re-compiled if the hash values of the static arguments change. Otherwise, re-using previous static argument values will result in no re-compilation:
```
Array(0.8, dtype=float64)
>>> f(0.1, 0.3)  # no re-compilation occurs
Array(0.4, dtype=float64)
>>> f(0.1, 0.4)  # y changes, re-compilation
Compiling with y=0.4
Array(0.5, dtype=float64)
```
Catalyst Autograph now supports updating a single index or a slice of JAX arrays using Python’s array assignment operator syntax. (#769) (#1143)

Using operator assignment syntax in favor of at...op expressions is now possible for the following operations:
- x[i] += y in favor of x.at[i].add(y)
- x[i] -= y in favor of x.at[i].add(-y)
- x[i] *= y in favor of x.at[i].multiply(y)
- x[i] /= y in favor of x.at[i].divide(y)
- x[i] **= y in favor of x.at[i].power(y)
```
@qjit(autograph=True)
def f(x):
    first_dim = x.shape[0]
    result = jnp.copy(x)

    for i in range(first_dim):
      result[i] *= 2  # This is now supported

    return result
```
```
>>> f(jnp.array([1, 2, 3]))
Array([2, 4, 6], dtype=int64)
```
Catalyst now has a standalone compiler tool called catalyst-cli that quantum-compiles MLIR input files into an object file independent of the Python frontend. (#1208) (#1255)

This compiler tool combines three stages of compilation:
1. quantum-opt: Performs the MLIR-level optimizations and lowers the input dialect to the LLVM dialect.
2. mlir-translate: Translates the input in the LLVM dialect into LLVM IR.
3. llc: Performs lower-level optimizations and creates the object file.
catalyst-cli runs all three stages under the hood by default, but it also has the ability to run each stage individually. For example:
```
# Creates both the optimized IR and an object file
catalyst-cli input.mlir -o output.o

# Only performs MLIR optimizations
catalyst-cli --tool=opt input.mlir -o llvm-dialect.mlir

# Only lowers LLVM dialect MLIR input to LLVM IR
catalyst-cli --tool=translate llvm-dialect.mlir -o llvm-ir.ll

# Only performs lower-level optimizations and creates object file
catalyst-cli --tool=llc llvm-ir.ll -o output.o
```
Note that catalyst-cli is only available when Catalyst is built from source, and is not included when installing Catalyst via pip or from wheels.
Experimental integration of the PennyLane capture module is available. It currently only supports quantum gates, without control flow. (#1109)

To trigger the PennyLane pipeline for capturing the program as a Jaxpr, simply set experimental_capture=True in the qjit decorator.
```
import pennylane as qml
from catalyst import qjit

dev = qml.device("lightning.qubit", wires=1)

@qjit(experimental_capture=True)
@qml.qnode(dev)
def circuit():
    qml.Hadamard(0)
    qml.CNOT([0, 1])
    return qml.expval(qml.Z(0))
```

Improvements

Multiple qml.sample calls can now be returned from the same program, and can be structured using Python containers. For example, a program can return a dictionary of the form return {"first": qml.sample(), "second": qml.sample()}. (#1051)
Catalyst now ships with null.qubit, a Catalyst runtime plugin that mocks out all functions in the QuantumDevice interface. This device is provided as a convenience for testing and benchmarking purposes. (#1179)
```
qml.device("null.qubit", wires=1)

@qml.qjit
@qml.qnode(dev)
def g(x):
    qml.RX(x, wires=0)
    return qml.probs(wires=[0])
```

Setting the seed argument in the qjit decorator will now seed sampled results, in addition to mid-circuit measurement results. (#1164)

dev = qml.device("lightning.qubit", wires=1, shots=10)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    m = catalyst.measure(0)

    if m:
        qml.Hadamard(0)

    return qml.sample()

@qml.qjit(seed=37, autograph=True)
def workflow(x):
    return jnp.squeeze(jnp.stack([circuit(x) for i in range(4)]))

>>> workflow(1.8)
Array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 1, 1, 0, 0, 1, 0],
       [0, 0, 1, 0, 1, 1, 0, 0, 1, 1],
       [1, 1, 1, 0, 0, 1, 1, 0, 1, 1]], dtype=int64)
>>> workflow(1.8)
Array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 1, 1, 0, 0, 1, 0],
       [0, 0, 1, 0, 1, 1, 0, 0, 1, 1],
       [1, 1, 1, 0, 0, 1, 1, 0, 1, 1]], dtype=int64)

Note that statistical measurement processes such as expval, var, and probs are currently not affected by seeding when shot noise is present.

The cancel_inverses MLIR compilation pass (-remove-chained-self-inverse) now supports cancelling all Hermitian gates, as well as adjoints of arbitrary unitary operations. (#1136) (#1186) (#1211)

For the full list of supported Hermitian gates please see the cancel_inverses documentation in catalyst.passes.
Support is expanded for backend devices that exclusively return samples in the measurement basis. Pre- and post-processing now allows qjit to be used on these devices with qml.expval, qml.var and qml.probs measurements in addition to qml.sample, using the measurements_from_samples transform. (#1106)
Scalar tensors are eliminated from control flow operations in the program, and are replaced with bare scalars instead. This improves compilation time and memory usage at runtime by avoiding heap allocations and reducing the amount of instructions. (#1075)
Catalyst now supports NumPy 2.0. (#1119) (#1182)
Compiling QNodes to asynchronous functions will no longer print to stderr in case of an error. (#645)
Gradient computations have been made more efficient, as calling gradients twice (with the same gradient parameters) will now only lower to a single MLIR function. (#1172)
qml.sample() and qml.counts() on lightning.qubit/kokkos can now be seeded with qjit(seed=...). (#1164) (#1248)
The compiler pass -remove-chained-self-inverse can now also cancel adjoints of arbitrary unitary operations (in addition to the named Hermitian gates). (#1186) (#1211)
Add Lightning-GPU support to Catalyst docs and update tests. (#1254)

Breaking changes

The static_size field in the AbstractQreg class has been removed. (#1113)

This reverts a previous breaking change.
Nesting QNodes within one another now raises an error. (#1176)
The debug.compile_from_mlir function has been removed; please use debug.replace_ir instead. (#1181)
The compiler.last_compiler_output function has been removed; please use compiler.get_output_of("last", workspace) instead. (#1208)

Bug fixes

Fixes a bug where the second execution of a function with abstracted axes is failing. (#1247)
Fixes a bug in catalyst.mitigate_with_zne that would lead to incorrectly extrapolated results. (#1213)
Fixes a bug preventing the target of qml.adjoint and qml.ctrl calls from being transformed by AutoGraph. (#1212)
Resolves a bug where mitigate_with_zne does not work properly with shots and devices supporting only counts and samples (e.g., Qrack). (#1165)
Resolves a bug in the vmap function when passing shapeless values to the target. (#1150)
Fixes a bug that resulted in an error message when using qml.cond on callables with arguments. (#1151)
Fixes a bug that prevented taking the gradient of nested accelerate callbacks. (#1156)
Fixes some small issues with scatter lowering: (#1216) (#1217)
- Registers the func dialect as a requirement for running the scatter lowering pass.
- Emits error if %input, %update and %result are not of length 1 instead of segfaulting.
Fixes a performance issue with catalyst.vmap, where the root cause was in the lowering of the scatter operation. (#1214)
Fixes a bug where conditional-ed single gates cannot be used in qjit, e.g. qml.cond(x > 1, qml.Hadamard)(wires=0). (#1232)

Internal changes

Removes deprecated PennyLane code across the frontend. (#1168)
Updates Enzyme to version v0.0.149. (#1142)
Adjoint canonicalization is now available in MLIR for CustomOp and MultiRZOp. It can be used with the --canonicalize pass in quantum-opt. (#1205)
Removes the MemMemCpyOptPass in llvm O2 (applied for Enzyme), which reduces bugs when running gradient-like functions. (#1063)
Bufferization of gradient.ForwardOp and gradient.ReverseOp now requires three steps: gradient-preprocessing, gradient-bufferize, and gradient-postprocessing. gradient-bufferize has a new rewrite for gradient.ReturnOp. (#1139)
A new MLIR pass detensorize-scf is added that works in conjunction with the existing linalg-detensorize pass to detensorize input programs. The IR generated by JAX wraps all values in the program in tensors, including scalars, leading to unnecessary memory allocations for programs compiled to CPU via the MLIR-to-LLVM pipeline. (#1075)
Importing Catalyst will now pollute less of JAX’s global variables by using LoweringParameters. (#1152)
Cached primitive lowerings is used instead of a custom cache structure. (#1159)
Functions with multiple tapes are now split with a new mlir pass --split-multiple-tapes, with one tape per function. The reset routine that makes a measurement between tapes and inserts an X gate if measured one is no longer used. (#1017) (#1130)
Prefer creating new qml.devices.ExecutionConfig objects over using the global qml.devices.DefaultExecutionConfig. Doing so helps avoid unexpected bugs and test failures in case the DefaultExecutionConfig object becomes modified from its original state. (#1137)
Remove the old QJITDevice API. (#1138)
The device-capability loading mechanism has been moved into the QJITDevice constructor. (#1141)
Several functions related to device capabilities have been refactored. (#1149)

In particular, the signatures of get_device_capability, catalyst_decompose, catalyst_acceptance, and QJITDevice.__init__ have changed, and the pennylane_operation_set function has been removed entirely.
Catalyst now generates nested modules denoting quantum programs. (#1144)

Similar to MLIR’s gpu.launch_kernel function, Catalyst, now supports a call_function_in_module. This allows Catalyst to call functions in modules and have modules denote a quantum kernel. This will allow for device-specific optimizations and compilation pipelines.

At the moment, no one is using this. This is just the necessary scaffolding to support device-specific transformations. As such, the module will be inlined to preserve current semantics. However, in the future, we will explore lowering this nested module into other IRs/binary formats and lowering call_function_in_module to something that can dispatch calls to another runtime/VM.

Contributors

This release contains contributions from (in alphabetical order):

Joey Carter, Spencer Comin, Amintor Dusko, Lillian M.A. Frederiksen, Sengthai Heng, David Ittah, Mehrdad Malekmohammadi, Vincent Michaud-Rioux, Romain Moyard, Erick Ochoa Lopez, Daniel Strano, Raul Torres, Paul Haochen Wang.

Release 0.8.0¶

New features

JAX-compatible functions that run on classical accelerators, such as GPUs, via catalyst.accelerate now support autodifferentiation. (#920)

For example,

from catalyst import qjit, grad

@qjit
@grad
def f(x):
    expm = catalyst.accelerate(jax.scipy.linalg.expm)
    return jnp.sum(expm(jnp.sin(x)) ** 2)

>>> x = jnp.array([[0.1, 0.2], [0.3, 0.4]])
>>> f(x)
Array([[2.80120452, 1.67518663],
       [1.61605839, 4.42856163]], dtype=float64)

Assertions can now be raised at runtime via the catalyst.debug_assert function. (#925)

Python-based exceptions (via raise) and assertions (via assert) will always be evaluated at program capture time, before certain runtime information may be available.

Use debug_assert to instead raise assertions at runtime, including assertions that depend on values of dynamic variables.

For example,
```
from catalyst import debug_assert

@qjit
def f(x):
    debug_assert(x < 5, "x was greater than 5")
    return x * 8
```
```
>>> f(4)
Array(32, dtype=int64)
>>> f(6)
RuntimeError: x was greater than 5
```
Assertions can be disabled globally for a qjit-compiled function via the disable_assertions keyword argument:
```
@qjit(disable_assertions=True)
def g(x):
    debug_assert(x < 5, "x was greater than 5")
    return x * 8
```
```
>>> g(6)
Array(48, dtype=int64)
```

Mid-circuit measurement results when using lightning.qubit and lightning.kokkos can now be seeded via the new seed argument of the qjit decorator. (#936)

The seed argument accepts an unsigned 32-bit integer, which is used to initialize the pseudo-random state at the beginning of each execution of the compiled function. Therefor, different qjit objects with the same seed (including repeated calls to the same qjit) will always return the same sequence of mid-circuit measurement results.

dev = qml.device("lightning.qubit", wires=1)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    m = measure(0)

    if m:
        qml.Hadamard(0)

    return qml.probs()

@qjit(seed=37, autograph=True)
def workflow(x):
    return jnp.stack([circuit(x) for i in range(4)])

Repeatedly calling the workflow function above will always result in the same values:

>>> workflow(1.8)
Array([[1. , 0. ],
     [1. , 0. ],
     [1. , 0. ],
     [0.5, 0.5]], dtype=float64)
>>> workflow(1.8)
Array([[1. , 0. ],
     [1. , 0. ],
     [1. , 0. ],
     [0.5, 0.5]], dtype=float64)

Note that setting the seed will not avoid shot-noise stochasticity in terminal measurement statistics such as sample or expval:

dev = qml.device("lightning.qubit", wires=1, shots=10)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    m = measure(0)

    if m:
        qml.Hadamard(0)

    return qml.expval(qml.PauliZ(0))

@qjit(seed=37, autograph=True)
def workflow(x):
    return jnp.stack([circuit(x) for i in range(4)])

>>> workflow(1.8)
Array([1. , 1. , 1. , 0.4], dtype=float64)
>>> workflow(1.8)
Array([ 1. ,  1. ,  1. , -0.2], dtype=float64)

Exponential fitting is now a supported method of zero-noise extrapolation when performing error mitigation in Catalyst using mitigate_with_zne. (#953)

This new functionality fits the data from noise-scaled circuits with an exponential function, and returns the zero-noise value:

from pennylane.transforms import exponential_extrapolate
from catalyst import mitigate_with_zne

dev = qml.device("lightning.qubit", wires=2, shots=100000)

@qml.qnode(dev)
def circuit(weights):
    qml.StronglyEntanglingLayers(weights, wires=[0, 1])
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

@qjit
def workflow(weights, s):
    zne_circuit = mitigate_with_zne(circuit, scale_factors=s, extrapolate=exponential_extrapolate)
    return zne_circuit(weights)

>>> weights = jnp.ones([3, 2, 3])
>>> scale_factors = jnp.array([1, 2, 3])
>>> workflow(weights, scale_factors)
Array(-0.19946598, dtype=float64)

A new module is available, catalyst.passes, which provides Python decorators for enabling and configuring Catalyst MLIR compiler passes. (#911) (#1037)

The first pass available is catalyst.passes.cancel_inverses, which enables the -removed-chained-self-inverse MLIR pass that cancels two neighbouring Hadamard gates.

from catalyst.debug import get_compilation_stage
from catalyst.passes import cancel_inverses

dev = qml.device("lightning.qubit", wires=1)

@qml.qnode(dev)
def circuit(x: float):
    qml.RX(x, wires=0)
    qml.Hadamard(wires=0)
    qml.Hadamard(wires=0)
    return qml.expval(qml.PauliZ(0))

@qjit(keep_intermediate=True)
def workflow(x):
    optimized_circuit = cancel_inverses(circuit)
    return circuit(x), optimized_circuit(x)

Catalyst now has debug functions get_compilation_stage and replace_ir to acquire and recompile the IR from a given pipeline pass for functions compiled with keep_intermediate=True. (#981)

For example, consider the following function:
```
@qjit(keep_intermediate=True)
def f(x):
    return x**2
```
```
>>> f(2.0)
4.0
```
Here we use get_compilation_stage to acquire the IR, and then modify %2 = arith.mulf %in, %in_0 : f64 to turn the square function into a cubic one via replace_ir:
```
from catalyst.debug import get_compilation_stage, replace_ir

old_ir = get_compilation_stage(f, "HLOLoweringPass")
new_ir = old_ir.replace(
    "%2 = arith.mulf %in, %in_0 : f64\n",
    "%t = arith.mulf %in, %in_0 : f64\n    %2 = arith.mulf %t, %in_0 : f64\n"
)
replace_ir(f, "HLOLoweringPass", new_ir)
```
The recompilation starts after the given checkpoint stage:
```
>>> f(2.0)
8.0
```
Either function can also be used independently of each other. Note that get_compilation_stage replaces the print_compilation_stage function; please see the Breaking Changes section for more details.

Catalyst now supports generating executables from compiled functions for the native host architecture using catalyst.debug.compile_executable. (#1003)

>>> @qjit
... def f(x):
...     y = x * x
...     catalyst.debug.print_memref(y)
...     return y
>>> f(5)
MemRef: base@ = 0x31ac22580 rank = 0 offset = 0 sizes = [] strides = [] data =
25
Array(25, dtype=int64)

We can use compile_executable to compile this function to a binary:

>>> from catalyst.debug import compile_executable
>>> binary = compile_executable(f, 5)
>>> print(binary)
/path/to/executable

Executing this function from a shell environment:

$ /path/to/executable
MemRef: base@ = 0x64fc9dd5ffc0 rank = 0 offset = 0 sizes = [] strides = [] data =
25

Improvements

Catalyst has been updated to work with JAX v0.4.28 (exact version match required). (#931) (#995)
Catalyst now supports keyword arguments for qjit-compiled functions. (#1004)
```
>>> @qjit
... @grad
... def f(x, y):
...     return x * y
>>> f(3., y=2.)
Array(2., dtype=float64)
```
Note that the static_argnums argument to the qjit decorator is not supported when passing argument values as keyword arguments.
Support has been added for the jax.numpy.argsort function within qjit-compiled functions. (#901)

Autograph now supports in-place array assignments with static slices. (#843)

For example,

@qjit(autograph=True)
def f(x, y):
    y[1:10:2] = x
    return y

>>> f(jnp.ones(5), jnp.zeros(10))
Array([0., 1., 0., 1., 0., 1., 0., 1., 0., 1.], dtype=float64)

Autograph now works when qjit is applied to a function decorated with vmap, cond, for_loop or while_loop. Previously, stacking the autograph-enabled qjit decorator directly on top of other Catalyst decorators would lead to errors. (#835) (#938) (#942)

from catalyst import vmap, qjit

dev = qml.device("lightning.qubit", wires=2)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    return qml.expval(qml.PauliZ(0))

>>> x = jnp.array([0.1, 0.2, 0.3])
>>> qjit(vmap(circuit), autograph=True)(x)
Array([0.99500417, 0.98006658, 0.95533649], dtype=float64)

Runtime memory usage, and compilation complexity, has been reduced by eliminating some scalar tensors from the IR. This has been done by adding a linalg-detensorize pass at the end of the HLO lowering pipeline. (#1010)

Program verification is extended to confirm that the measurements included in QNodes are compatible with the specified device and settings. (#945) (#962)

>>> dev = qml.device("lightning.qubit", wires=2, shots=None)
>>> @qjit
... @qml.qnode(dev)
... def circuit(params):
...     qml.RX(params[0], wires=0)
...     qml.RX(params[1], wires=1)
...     return {
...         "sample": qml.sample(wires=[0, 1]),
...         "expval": qml.expval(qml.PauliZ(0))
...     }
>>> circuit([0.1, 0.2])
CompileError: Sample-based measurements like sample(wires=[0, 1])
cannot work with shots=None. Please specify a finite number of shots.

On devices that support it, initial state preparation routines qml.StatePrep and qml.BasisState are no longer decomposed when using Catalyst, improving compilation and runtime performance. (#955) (#1047) (#1062) (#1073)

Improved type validation and error messaging has been added to both the catalyst.jvp and catalyst.vjp functions to ensure that the (co)tangent and parameter types are compatible. (#1020) (#1030) (#1031)

For example, providing an integer tangent for a function with float64 parameters will result in an error:

>>> f = lambda x: (2 * x, x * x)
>>> f_jvp = lambda x: catalyst.jvp(f, params=(x,), tangents=(1,))
>>> qjit(f_jvp)(0.5)
TypeError: function params and tangents arguments to catalyst.jvp do not match;
dtypes must be equal. Got function params dtype float64 and so expected tangent
dtype float64, but got tangent dtype int64 instead.

Ensuring that the types match will resolve the error:

>>> f_jvp = lambda x: catalyst.jvp(f, params=(x,), tangents=(1.0,))
>>> qjit(f_jvp)(0.5)
((Array(1., dtype=float64), Array(0.25, dtype=float64)),
 (Array(2., dtype=float64), Array(1., dtype=float64)))

Add a script for setting up a Frontend-Only Development Environment that does not require compilation, as it uses the TestPyPI wheel shared libraries. (#1022)

Breaking changes

The argnum keyword argument in the grad, jacobian, value_and_grad, vjp, and jvp functions has been renamed to argnums to better match JAX. (#1036)
Return values of qjit-compiled functions that were previously numpy.ndarray are now of type jax.Array instead. This should have minimal impact, but code that depends on the output of qjit-compiled function being NumPy arrays will need to be updated. (#895)

The print_compilation_stage function has been renamed get_compilation_stage. It no longer prints the IR to the standard output, instead it simply returns the IR as a string. (#981)

>>> @qjit(keep_intermediate=True)
... def func(x: float):
...     return x
>>> print(get_compilation_stage(func, "HLOLoweringPass"))
module @func {
  func.func public @jit_func(%arg0: tensor<f64>)
  -> tensor<f64> attributes {llvm.emit_c_interface} {
    return %arg0 : tensor<f64>
  }
  func.func @setup() {
    quantum.init
    return
  }
  func.func @teardown() {
    quantum.finalize
    return
  }
}

Support for TOML files in Schema 1 has been disabled. (#960)
The mitigate_with_zne function no longer accepts a degree parameter for polynomial fitting and instead accepts a callable to perform extrapolation. Any qjit-compatible extrapolation function is valid. Keyword arguments can be passed to this function using the extrapolate_kwargs keyword argument in mitigate_with_zne. (#806)
The QuantumDevice API has now added the functions SetState and SetBasisState for simulators that may benefit from instructions that directly set the state. Implementing these methods is optional, and device support can be indicated via the initial_state_prep flag in the TOML configuration file. (#955)

Bug fixes

Catalyst no longer silently converts complex parameters to floats where floats are expected, instead an error is raised. (#1008)
Fixes a bug where dynamic one-shot did not work when no mid-circuit measurements are present and when the return type is an iterable. (#1060)
Fixes a bug finding the quantum function jaxpr when using quantum primitives with dynamic one-shot (#1041)
Fix a bug where LegacyDevice number of shots is not correctly extracted when using the legacyDeviceFacade. (#1035)
Catalyst no longer generates a QubitUnitary operation during decomposition if a device doesn’t support it. Instead, the operation that would lead to a QubitUnitary is either decomposed or raises an error. (#1002)
Correctly errors out when user uses qml.density_matrix (#1118)

Catalyst now preserves output PyTrees in QNodes executed with mcm_method="one-shot". (#957)

For example:

dev = qml.device("lightning.qubit", wires=1, shots=20)
@qml.qjit
@qml.qnode(dev, mcm_method="one-shot")
def func(x):
    qml.RX(x, wires=0)
    m_0 = catalyst.measure(0, postselect=1)
    return {"hi": qml.expval(qml.Z(0))}

>>> func(0.9)
{'hi': Array(-1., dtype=float64)}

Fixes a bug where scatter did not work correctly with list indices. (#982)

A = jnp.ones([3, 3]) * 2

def update(A):
    A = A.at[[0, 1], :].set(jnp.ones([2, 3]), indices_are_sorted=True, unique_indices=True)
    return A

>>> update
[[1. 1. 1.]
 [1. 1. 1.]
 [2. 2. 2.]]

Static arguments can now be passed through a QNode when specified with the static_argnums keyword argument. (#932)

dev = qml.device("lightning.qubit", wires=1)

@qjit(static_argnums=(1,))
@qml.qnode(dev)
def circuit(x, c):
    print("Inside QNode:", c)
    qml.RY(c, 0)
    qml.RX(x, 0)
    return qml.expval(qml.PauliZ(0))

When executing the qjit-compiled function above, c will be a static variable with value known at compile time:

>>> circuit(0.5, 0.5)
"Inside QNode: 0.5"
Array(0.77015115, dtype=float64)

Changing the value of c will result in re-compilation:

>>> circuit(0.5, 0.8)
"Inside QNode: 0.8"
Array(0.61141766, dtype=float64)

Fixes a bug where Catalyst would fail to apply quantum transforms and preserve QNode configuration settings when Autograph was enabled. (#900)

pure_callback will no longer cause a crash in the compiler if the return type signature is declared incorrectly and the callback function is differentiated. (#916)

Instead, this is caught early and a useful error message returned:

@catalyst.pure_callback
def callback_fn(x) -> jax.ShapeDtypeStruct((2,), jnp.float32):
    return np.array([np.sin(x), np.cos(x)])

callback_fn.fwd(lambda x: (callback_fn(x), x))
callback_fn.bwd(lambda x, dy: (jnp.array([jnp.cos(x), -jnp.sin(x)]) @ dy,))

@qjit
@catalyst.grad
def f(x):
    return jnp.sum(callback_fn(jnp.sin(x)))

>>> f(0.54)
TypeError: Callback callback_fn expected type ShapedArray(float32[2]) but observed ShapedArray(float64[2]) in its return value

AutoGraph will now correctly convert conditional statements where the condition is a non-boolean static value. (#944)

Internally, statically known non-boolean predicates (such as 1) will be converted to bool:
```
@qml.qjit(autograph=True)
def workflow(x):
    n = 1

    if n:
        y = x ** 2
    else:
        y = x

    return y
```

value_and_grad will now correctly differentiate functions with multiple arguments. Previously, attempting to differentiate functions with multiple arguments, or pass the argnums argument, would result in an error. (#1034)

@qjit
def g(x, y, z):
    def f(x, y, z):
        return x * y ** 2 * jnp.sin(z)
    return catalyst.value_and_grad(f, argnums=[1, 2])(x, y, z)

>>> g(0.4, 0.2, 0.6)
(Array(0.00903428, dtype=float64),
 (Array(0.0903428, dtype=float64), Array(0.01320537, dtype=float64)))

A bug is fixed in catalyst.debug.get_cmain to support multi-dimensional arrays as function inputs. (#1003)
Bug fixed when parameter annotations return strings. (#1078)
In certain cases, jax.scipy.linalg.expm may return incorrect numerical results when used within a qjit-compiled function. A warning will now be raised when jax.scipy.linalg.expm is used to inform of this issue.

In the meantime, we strongly recommend the catalyst.accelerate function within qjit-compiled function to call jax.scipy.linalg.expm directly.
```
@qjit
def f(A):
    B = catalyst.accelerate(jax.scipy.linalg.expm)(A)
    return B
```
Note that this PR doesn’t actually fix the aforementioned numerical errors, and just raises a warning. (#1082)

Documentation

A page has been added to the documentation, listing devices that are Catalyst compatible. (#966)

Internal changes

Adds catalyst.from_plxpr.from_plxpr for converting a PennyLane variant jaxpr into a Catalyst variant jaxpr. (#837)
Catalyst now uses Enzyme v0.0.130 (#898)
When memrefs have no identity layout, memrefs copy operations are replaced by the linalg copy operation. It does not use a runtime function but instead lowers to scf and standard dialects. It also ensures a better compatibility with Enzyme. (#917)
LLVM’s O2 optimization pipeline and Enzyme’s AD transformations are now only run in the presence of gradients, significantly improving compilation times for programs without derivatives. Similarly, LLVM’s coroutine lowering passes only run when async_qnodes is enabled in the QJIT decorator. (#968)
The function inactive_callback was renamed __catalyst_inactive_callback. (#899)
The function __catalyst_inactive_callback has the nofree attribute. (#898)
catalyst.dynamic_one_shot uses postselect_mode="pad-invalid-samples" in favour of interface="jax" when processing results. (#956)
Callbacks now have nicer identifiers in their MLIR representation. The identifiers include the name of the Python function being called back into. (#919)
Fix tracing of SProd operations to bring Catalyst in line with PennyLane v0.38. (#935)

After some changes in PennyLane, Sprod.terms() returns the terms as leaves instead of a tree. This means that we need to manually trace each term and finally multiply it with the coefficients to create a Hamiltonian.
The function mitigate_with_zne accomodates a folding input argument for specifying the type of circuit folding technique to be used by the error-mitigation routine (only global value is supported to date.) (#946)
Catalyst’s implementation of Lightning Kokkos plugin has been removed in favor of Lightning’s one. (#974)
The validate_device_capabilities function is considered obsolete. Hence, it has been removed. (#1045)

Contributors

This release contains contributions from (in alphabetical order):

Joey Carter, Alessandro Cosentino, Lillian M. A. Frederiksen, David Ittah, Josh Izaac, Christina Lee, Kunwar Maheep Singh, Mehrdad Malekmohammadi, Romain Moyard, Erick Ochoa Lopez, Mudit Pandey, Nate Stemen, Raul Torres, Tzung-Han Juang, Paul Haochen Wang,

Release 0.7.0¶

New features

Add support for accelerating classical processing via JAX with catalyst.accelerate. (#805)

Classical code that can be just-in-time compiled with JAX can now be seamlessly executed on GPUs or other accelerators with catalyst.accelerate, right inside of QJIT-compiled functions.
```
@accelerate(dev=jax.devices("gpu")[0])
def classical_fn(x):
    return jnp.sin(x) ** 2

@qjit
def hybrid_fn(x):
    y = classical_fn(jnp.sqrt(x)) # will be executed on a GPU
    return jnp.cos(y)
```
Available devices can be retrieved via jax.devices(). If not provided, the default value of jax.devices()[0] as determined by JAX will be used.

Catalyst callback functions, such as pure_callback, debug.callback, and debug.print, now all support auto-differentiation. (#706) (#782) (#822) (#834) (#882) (#907)

When using callbacks that do not return any values, such as catalyst.debug.callback and catalyst.debug.print, these functions are marked as ‘inactive’ and do not contribute to or affect the derivative of the function:

import logging

log = logging.getLogger(__name__)
log.setLevel(logging.INFO)

@qml.qjit
@catalyst.grad
def f(x):
    y = jnp.cos(x)
    catalyst.debug.print("Debug print: y = {0:.4f}", y)
    catalyst.debug.callback(lambda _: log.info("Value of y = %s", _))(y)
    return y ** 2

>>> f(0.54)
INFO:__main__:Value of y = 0.8577086813638242
Debug print: y = 0.8577
array(-0.88195781)

Callbacks that do return values and may affect the qjit-compiled functions computation, such as pure_callback, may have custom derivatives manually registered with the Catalyst compiler in order to support differentiation.

This can be done via the pure_callback.fwd and pure_callback.bwd methods, to specify how the forwards and backwards pass (the vector-Jacobian product) of the callback should be computed:

@catalyst.pure_callback
def callback_fn(x) -> float:
    return np.sin(x[0]) * x[1]

@callback_fn.fwd
def callback_fn_fwd(x):
    # returns the evaluated function as well as residual
    # values that may be useful for the backwards pass
    return callback_fn(x), x

@callback_fn.bwd
def callback_fn_vjp(res, dy):
    # Accepts residuals from the forward pass, as well
    # as (one or more) cotangent vectors dy, and returns
    # a tuple of VJPs corresponding to each input parameter.

    def vjp(x, dy) -> (jax.ShapeDtypeStruct((2,), jnp.float64),):
        return (np.array([np.cos(x[0]) * dy * x[1], np.sin(x[0]) * dy]),)

    # The VJP function can also be a pure callback
    return catalyst.pure_callback(vjp)(res, dy)

@qml.qjit
@catalyst.grad
def f(x):
    y = jnp.array([jnp.cos(x[0]), x[1]])
    return jnp.sin(callback_fn(y))

>>> x = jnp.array([0.1, 0.2])
>>> f(x)
array([-0.01071923,  0.82698717])

Catalyst now supports the ‘dynamic one shot’ method for simulating circuits with mid-circuit measurements, which compared to other methods, may be advantageous for circuits with many mid-circuit measurements executed for few shots. (#5617) (#798)

The dynamic one shot method evaluates dynamic circuits by executing them one shot at a time via catalyst.vmap, sampling a dynamic execution path for each shot. This method only works for a QNode executing with finite shots, and it requires the device to support mid-circuit measurements natively.

This new mode can be specified by using the mcm_method argument of the QNode:
```
dev = qml.device("lightning.qubit", wires=5, shots=20)

@qml.qjit(autograph=True)
@qml.qnode(dev, mcm_method="one-shot")
def circuit(x):

    for i in range(10):
        qml.RX(x, 0)
        m = catalyst.measure(0)

        if m:
            qml.RY(x ** 2, 1)

        x = jnp.sin(x)

    return qml.expval(qml.Z(1))
```
Catalyst’s existing method for simulating mid-circuit measurements remains available via mcm_method="single-branch-statistics".

When using mcm_method="one-shot", the postselect_mode keyword argument can also be used to specify whether the returned result should include shots-number of postselected measurements ("fill-shots"), or whether results should include all results, including invalid postselections ("hw_like"):
```
@qml.qjit
@qml.qnode(dev, mcm_method="one-shot", postselect_mode="hw-like")
def func(x):
    qml.RX(x, wires=0)
    m_0 = catalyst.measure(0, postselect=1)
    return qml.sample(wires=0)
```
```
>>> res = func(0.9)
>>> res
array([-2147483648, -2147483648,           1, -2147483648, -2147483648,
       -2147483648, -2147483648,           1, -2147483648, -2147483648,
       -2147483648, -2147483648,           1, -2147483648, -2147483648,
       -2147483648, -2147483648, -2147483648, -2147483648, -2147483648])
>>> jnp.delete(res, jnp.where(res == np.iinfo(np.int32).min)[0])
Array([1, 1, 1], dtype=int64)
```
Note that invalid shots will not be discarded, but will be replaced by np.iinfo(np.int32).min. They will not be used for processing final results (like expectation values), but they will appear in the output of QNodes that return samples directly.

For more details, see the dynamic quantum circuit documentation.

Catalyst now has support for returning qml.sample(m) where m is the result of a mid-circuit measurement. (#731)

When used with mcm_method="one-shot", this will return an array with one measurement result for each shot:

dev = qml.device("lightning.qubit", wires=2, shots=10)

@qml.qjit
@qml.qnode(dev, mcm_method="one-shot")
def func(x):
    qml.RX(x, wires=0)
    m = catalyst.measure(0)
    qml.RX(x ** 2, wires=0)
    return qml.sample(m), qml.expval(qml.PauliZ(0))

>>> func(0.9)
(array([0, 1, 0, 0, 0, 0, 1, 0, 0, 0]), array(0.4))

In mcm_method="single-branch-statistics" mode, it will be equivalent to returning m directly from the quantum function — that is, it will return a single boolean corresponding to the measurement in the branch selected:

@qml.qjit
@qml.qnode(dev, mcm_method="single-branch-statistics")
def func(x):
    qml.RX(x, wires=0)
    m = catalyst.measure(0)
    qml.RX(x ** 2, wires=0)
    return qml.sample(m), qml.expval(qml.PauliZ(0))

>>> func(0.9)
(array(False), array(0.8))

A new function, catalyst.value_and_grad, returns both the result of a function and its gradient with a single forward and backwards pass. (#804) (#859)

This can be more efficient, and reduce overall quantum executions, compared to separately executing the function and then computing its gradient.

For example:

dev = qml.device("lightning.qubit", wires=3)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    qml.CNOT(wires=[0, 1])
    qml.RX(x, wires=2)
    return qml.probs()

@qml.qjit
@catalyst.value_and_grad
def cost(x):
    return jnp.sum(jnp.cos(circuit(x)))

>>> cost(0.543)
(array(7.64695856), array(0.33413963))

Autograph now supports single index JAX array assignments (#717)

When using Autograph, syntax of the form x[i] = y where i is a single integer will now be automatically converted to the JAX equivalent of x = x.at(i).set(y):

@qml.qjit(autograph=True)
def f(array):
    result = jnp.ones(array.shape, dtype=array.dtype)

    for i, x in enumerate(array):
        result[i] = result[i] + x * 3

    return result

>>> f(jnp.array([-0.1, 0.12, 0.43, 0.54]))
array([0.7 , 1.36, 2.29, 2.62])

Catalyst now supports dynamically-shaped arrays in control-flow primitives. Arrays with dynamic shapes can now be used with for_loop, while_loop, and cond primitives. (#775) (#777) (#830)

@qjit
def f(shape):
    a = jnp.ones([shape], dtype=float)

    @for_loop(0, 10, 2)
    def loop(i, a):
        return a + i

    return loop(a)

>>> f(3)
array([21., 21., 21.])

Support has been added for disabling Autograph for specific functions. (#705) (#710)

The decorator catalyst.disable_autograph allows one to disable Autograph from auto-converting specific external functions when called within a qjit-compiled function with autograph=True:

def approximate_e(n):
    num = 1.
    fac = 1.
    for i in range(1, n + 1):
        fac *= i
        num += 1. / fac
    return num

@qml.qjit(autograph=True)
def g(x: float, N: int):

    for i in range(N):
        x = x + catalyst.disable_autograph(approximate_e)(10) / x ** i

    return x

>>> g(0.1, 10)
array(4.02997319)

Note that for Autograph to be disabled, the decorated function must be defined outside the qjit-compiled function. If it is defined within the qjit-compiled function, it will continue to be converted with Autograph.

In addition, Autograph can also be disabled for all externally defined functions within a qjit-compiled function via the context manager syntax:

@qml.qjit(autograph=True)
def g(x: float, N: int):

    for i in range(N):
        with catalyst.disable_autograph:
          x = x + approximate_e(10) / x ** i

    return x

Support for including a list of (sub)modules to be allowlisted for autograph conversion. (#725)

Although library code is not meant to be targeted by Autograph conversion, it sometimes make sense to enable it for specific submodules that might benefit from such conversion:
```
@qjit(autograph=True, autograph_include=["excluded_module.submodule"])
def f(x):
  return excluded_module.submodule.func(x)
```
For example, this might be useful if importing functionality from PennyLane (such as a transform or decomposition), and would like to have Autograph capture and convert associated control flow.

Controlled operations that do not have a matrix representation defined are now supported via applying PennyLane’s decomposition. (#831)

@qjit
@qml.qnode(qml.device("lightning.qubit", wires=2))
def circuit():
    qml.Hadamard(0)
    qml.ctrl(qml.TrotterProduct(H, time=2.4, order=2), control=[1])
    return qml.state()

Catalyst has now officially support on Linux aarch64, with pre-built binaries available on PyPI; simply pip install pennylane-catalyst on Linux aarch64 systems. (#767)

Improvements

Validation is now performed for observables and operations to ensure that provided circuits are compatible with the devices for execution. (#626) (#783)

dev = qml.device("lightning.qubit", wires=2, shots=10000)

@qjit
@qml.qnode(dev)
def circuit(x):
    qml.Hadamard(wires=0)
    qml.CRX(x, wires=[0, 1])
    return qml.var(qml.PauliZ(1))

>>> circuit(0.43)
DifferentiableCompileError: Variance returns are forbidden in gradients

Catalyst’s adjoint and ctrl methods are now fully compatible with the PennyLane equivalent when applied to a single Operator. This should lead to improved compatibility with PennyLane library code, as well when reusing quantum functions with both Catalyst and PennyLane. (#768) (#771) (#802)
Controlled operations defined via specialized classes (like Toffoli or ControlledQubitUnitary) are now implemented as controlled versions of their base operation if the device supports it. In particular, MultiControlledX is no longer executed as a QubitUnitary with Lightning. (#792)
The Catalyst frontend now supports Python logging through PennyLane’s qml.logging module. For more details, please see the logging documentation. (#660)
Catalyst now performs a stricter validation of the wire requirements for devices. In particular, only integer, continuous wire labels starting at 0 are allowed. (#784)
Catalyst no longer disallows quantum circuits with 0 qubits. (#784)
Added support for IsingZZ as a native gate in Catalyst. Previously, the IsingZZ gate would be decomposed into a CNOT and RZ gates, even if a device supported it. (#730)
All decorators in Catalyst, including vmap, qjit, mitigate_with_zne, as well as gradient decorators grad, jacobian, jvp, and vjp, can now be used both with and without keyword arguments as a decorator without the need for functools.partial: (#758) (#761) (#762) (#763)
```
@qjit
@grad(method="fd")
def fn1(x):
    return x ** 2

@qjit(autograph=True)
@grad
def fn2(x):
    return jnp.sin(x)
```
```
>>> fn1(0.43)
array(0.8600001)
>>> fn2(0.12)
array(0.99280864)
```
The built-in instrumentation with detailed output will no longer report the cumulative time for MLIR pipelines, since the cumulative time was being reported as just another step alongside individual timings for each pipeline. (#772)
Raise a better error message when no shots are specified and qml.sample or qml.counts is used. (#786)
The finite difference method for differentiation is now always allowed, even on functions with mid-circuit measurements, callbacks without custom derivates, or other operations that cannot be differentiated via traditional autodiff. (#789)
A non_commuting_observables flag has been added to the device TOML schema, indicating whether or not the device supports measuring non-commuting observables. If false, non-commuting measurements will be split into multiple executions. (#821)

The underlying PennyLane Operation objects for cond, for_loop, and while_loop can now be accessed directly via body_function.operation. (#711)

This can be beneficial when, among other things, writing transforms without using the queuing mechanism:

@qml.transform
def my_quantum_transform(tape):
    ops = tape.operations.copy()

    @for_loop(0, 4, 1)
    def f(i, sum):
        qml.Hadamard(0)
        return sum+1

    res = f(0)
    ops.append(f.operation)   # This is now supported!

    def post_processing_fn(results):
        return results
    modified_tape = qml.tape.QuantumTape(ops, tape.measurements)
    print(res)
    print(modified_tape.operations)
    return [modified_tape], post_processing_fn

@qml.qjit
@my_quantum_transform
@qml.qnode(qml.device("lightning.qubit", wires=2))
def main():
    qml.Hadamard(0)
    return qml.probs()

>>> main()
Traced<ShapedArray(int64[], weak_type=True)>with<DynamicJaxprTrace(level=2/1)>
[Hadamard(wires=[0]), ForLoop(tapes=[[Hadamard(wires=[0])]])]
(array([0.5, 0. , 0.5, 0. ]),)

Breaking changes

Binary distributions for Linux are now based on manylinux_2_28 instead of manylinux_2014. As a result, Catalyst will only be compatible on systems with glibc versions 2.28 and above (e.g., Ubuntu 20.04 and above). (#663)

Bug fixes

Functions that have been annotated with return type annotations will now correctly compile with @qjit. (#751)
An issue in the Lightning backend for the Catalyst runtime has been fixed that would only compute approximate probabilities when implementing mid-circuit measurements. As a result, low shot numbers would lead to unexpected behaviours or projections on zero probability states. Probabilities for mid-circuit measurements are now always computed analytically. (#801)
The Catalyst runtime now raises an error if a qubit is accessed out of bounds from the allocated register. (#784)
jax.scipy.linalg.expm is now supported within qjit-compiled functions. (#733) (#752)

This required correctly linking openblas routines necessary for jax.scipy.linalg.expm. In this bug fix, four openblas routines were newly linked and are now discoverable by stablehlo.custom_call@<blas_routine>. They are blas_dtrsm, blas_ztrsm, lapack_dgetrf, lapack_zgetrf.
Fixes a bug where QNodes that contained QubitUnitary with a complex matrix would error during gradient computation. (#778)
Callbacks can now return types which can be flattened and unflattened. (#812)
catalyst.qjit and catalyst.grad now work correctly on functions that have been wrapped with functools.partial. (#820)

Internal changes

Catalyst uses the collapse method of Lightning simulators in Measure to select a state vector branch and normalize. (#801)
Measurement process primitives for Catalyst’s JAXPR representation now have a standardized call signature so that shots and shape can both be provided as keyword arguments. (#790)
The QCtrl class in Catalyst has been renamed to HybridCtrl, indicating its capability to contain a nested scope of both quantum and classical operations. Using ctrl on a single operation will now directly dispatch to the equivalent PennyLane class. (#771)
The Adjoint class in Catalyst has been renamed to HybridAdjoint, indicating its capability to contain a nested scope of both quantum and classical operations. Using adjoint on a single operation will now directly dispatch to the equivalent PennyLane class. (#768) (#802)
Add support to use a locally cloned PennyLane Lightning repository with the runtime. (#732)
The qjit_device.py and preprocessing.py modules have been refactored into the sub-package catalyst.device. (#721)
The ag_autograph.py and autograph.py modules have been refactored into the sub-package catalyst.autograph. (#722)
Callback refactoring. This refactoring creates the classes FlatCallable and MemrefCallable. (#742)

The FlatCallable class is a Callable that is initialized by providing some parameters and kwparameters that match the the expected shapes that will be received at the callsite. Instead of taking shaped *args and **kwargs, it receives flattened arguments. The flattened arguments are unflattened with the shapes with which the function was initialized. The FlatCallable return values will allways be flattened before returning to the caller.

The MemrefCallable is a subclass of FlatCallable. It takes a result type parameter during initialization that corresponds to the expected return type. This class is expected to be called only from the Catalyst runtime. It expects all arguments to be void* to memrefs. These void* are casted to MemrefStructDescriptors using ctypes, numpy arrays, and finally jax arrays. These flat jax arrays are then sent to the FlatCallable. MemrefCallable is again expected to be called only from within the Catalyst runtime. And the return values match those expected by Catalyst runtime.

This separation allows for a better separation of concerns, provides a nicer interface and allows for multiple MemrefCallable to be defined for a single callback, which is necessary for custom gradient of pure_callbacks.
A new catalyst::gradient::GradientOpInterface is available when querying the gradient method in the mlir c++ api. (#800)

catalyst::gradient::GradOp, ValueAndGradOp, JVPOp, and VJPOp now inherits traits in this new GradientOpInterface. The supported attributes are now getMethod(), getCallee(), getDiffArgIndices(), getDiffArgIndicesAttr(), getFiniteDiffParam(), and getFiniteDiffParamAttr().
- There are operations that could potentially be used as GradOp, ValueAndGradOp, JVPOp or VJPOp. When trying to get the gradient method, instead of doing
```
auto gradOp = dyn_cast<GradOp>(op);
auto jvpOp = dyn_cast<JVPOp>(op);
auto vjpOp = dyn_cast<VJPOp>(op);

llvm::StringRef MethodName;
if (gradOp)
    MethodName = gradOp.getMethod();
else if (jvpOp)
    MethodName = jvpOp.getMethod();
else if (vjpOp)
    MethodName = vjpOp.getMethod();
```
  to identify which op it actually is and protect against segfaults (calling nullptr.getMethod()), in the new interface we just do
```
auto gradOpInterface = cast<GradientOpInterface>(op);
llvm::StringRef MethodName = gradOpInterface.getMethod();
```
- Another advantage is that any concrete gradient operation object can behave like a GradientOpInterface:
```
GradOp op; // or ValueAndGradOp op, ...
auto foo = [](GradientOpInterface op){
  llvm::errs() << op.getCallee();
};
foo(op);  // this works!
```
- Finally, concrete op specific methods can still be called by “reinterpret”-casting the interface back to a concrete op (provided the concrete op type is correct):
```
auto foo = [](GradientOpInterface op){
  size_t numGradients = cast<ValueAndGradOp>(&op)->getGradients().size();
};
ValueAndGradOp op;
foo(op);  // this works!
```

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, Lillian M.A. Frederiksen, David Ittah, Christina Lee, Erick Ochoa, Haochen Paul Wang, Lee James O’Riordan, Mehrdad Malekmohammadi, Vincent Michaud-Rioux, Mudit Pandey, Raul Torres, Sergei Mironov, Tzung-Han Juang.

Release 0.6.0¶

New features

Catalyst now supports externally hosted callbacks with parameters and return values within qjit-compiled code. This provides the ability to insert native Python code into any qjit-compiled function, allowing for the capability to include subroutines that do not yet support qjit-compilation and enhancing the debugging experience. (#540) (#596) (#610) (#650) (#649) (#661) (#686) (#689)

The following two callback functions are available:
- catalyst.pure_callback supports callbacks of pure functions. That is, functions with no side-effects that accept parameters and return values. However, the return type and shape of the function must be known in advance, and is provided as a type signature.
```
@pure_callback
def callback_fn(x) -> float:
    # here we call non-JAX compatible code, such
    # as standard NumPy
    return np.sin(x)

@qjit
def fn(x):
    return jnp.cos(callback_fn(x ** 2))
```
```
>>> fn(0.654)
array(0.9151995)
```
- catalyst.debug.callback supports callbacks of functions with no return values. This makes it an easy entry point for debugging, for example via printing or logging at runtime.
```
@catalyst.debug.callback
def callback_fn(y):
    print("Value of y =", y)

@qjit
def fn(x):
    y = jnp.sin(x)
    callback_fn(y)
    return y ** 2
```
```
>>> fn(0.54)
Value of y = 0.5141359916531132
array(0.26433582)
>>> fn(1.52)
Value of y = 0.998710143975583
array(0.99742195)
```
Note that callbacks do not currently support differentiation, and cannot be used inside functions that catalyst.grad is applied to.
More flexible runtime printing through support for format strings. (#621)

The catalyst.debug.print function has been updated to support Python-like format strings:
```
@qjit
def cir(a, b, c):
    debug.print("{c} {b} {a}", a=a, b=b, c=c)
```
```
>>> cir(1, 2, 3)
3 2 1
```
Note that previous functionality of the print function to print out memory reference information of variables has been moved to catalyst.debug.print_memref.

Catalyst now supports QNodes that execute on Oxford Quantum Circuits (OQC) superconducting hardware, via OQC Cloud. (#578) (#579) (#691)

To use OQC Cloud with Catalyst, simply ensure your credentials are set as environment variables, and load the oqc.cloud device to be used within your qjit-compiled workflows.

import os
os.environ["OQC_EMAIL"] = "your_email"
os.environ["OQC_PASSWORD"] = "your_password"
os.environ["OQC_URL"] = "oqc_url"

dev = qml.device("oqc.cloud", backend="lucy", shots=2012, wires=2)

@qjit
@qml.qnode(dev)
def circuit(a: float):
    qml.Hadamard(0)
    qml.CNOT(wires=[0, 1])
    qml.RX(wires=0)
    return qml.counts(wires=[0, 1])

print(circuit(0.2))

Catalyst now ships with an instrumentation feature allowing to explore what steps are run during compilation and execution, and for how long. (#528) (#597)

Instrumentation can be enabled from the frontend with the catalyst.debug.instrumentation context manager:
```
>>> @qjit
... def expensive_function(a, b):
...     return a + b
>>> with debug.instrumentation("session_name", detailed=False):
...     expensive_function(1, 2)
[DIAGNOSTICS] Running capture                   walltime: 3.299 ms      cputime: 3.294 ms       programsize: 0 lines
[DIAGNOSTICS] Running generate_ir               walltime: 4.228 ms      cputime: 4.225 ms       programsize: 14 lines
[DIAGNOSTICS] Running compile                   walltime: 57.182 ms     cputime: 12.109 ms      programsize: 121 lines
[DIAGNOSTICS] Running run                       walltime: 1.075 ms      cputime: 1.072 ms
```
The results will be appended to the provided file if the filename attribute is set, and printed to the console otherwise. The flag detailed determines whether individual steps in the compiler and runtime are instrumented, or whether only high-level steps like “program capture” and “compilation” are reported.

Measurements currently include wall time, CPU time, and (intermediate) program size.

Improvements

AutoGraph now supports return statements inside conditionals in qjit-compiled functions. (#583)

For example, the following pattern is now supported, as long as all return values have the same type:

@qjit(autograph=True)
def fn(x):
    if x > 0:
        return jnp.sin(x)
    return jnp.cos(x)

>>> fn(0.1)
array(0.09983342)
>>> fn(-0.1)
array(0.99500417)

This support extends to quantum circuits:

dev = qml.device("lightning.qubit", wires=1)

@qjit(autograph=True)
@qml.qnode(dev)
def f(x: float):
  qml.RX(x, wires=0)

  m = catalyst.measure(0)

  if not m:
      return m, qml.expval(qml.PauliZ(0))

  qml.RX(x ** 2, wires=0)

  return m, qml.expval(qml.PauliZ(0))

>>> f(1.4)
(array(False), array(1.))
>>> f(1.4)
(array(True), array(0.37945176))

Note that returning results with different types or shapes within the same function, such as different observables or differently shaped arrays, is not possible.

Errors are now raised at compile time if the gradient of an unsupported function is requested. (#204)

At the moment, CompileError exceptions will be raised if at compile time it is found that code reachable from the gradient operation contains either a mid-circuit measurement, a callback, or a JAX-style custom call (which happens through the mitigation operation as well as certain JAX operations).
Catalyst now supports devices built from the new PennyLane device API. (#565) (#598) (#599) (#636) (#638) (#664) (#687)

When using the new device API, Catalyst will discard the preprocessing from the original device, replacing it with Catalyst-specific preprocessing based on the TOML file provided by the device. Catalyst also requires that provided devices specify their wires upfront.
A new compiler optimization that removes redundant chains of self inverse operations has been added. This is done within a new MLIR pass called remove-chained-self-inverse. Currently we only match redundant Hadamard operations, but the list of supported operations can be expanded. (#630)
The catalyst.measure operation is now more lenient in the accepted type for the wires parameter. In addition to a scalar, a 1D array is also accepted as long as it only contains one element. (#623)

For example, the following is now supported:
```
catalyst.measure(wires=jnp.array([0]))
```
The compilation & execution of @qjit compiled functions can now be aborted using an interrupt signal (SIGINT). This includes using CTRL-C from a command line and the Interrupt button in a Jupyter Notebook. (#642)
The Catalyst Amazon Braket support has been updated to work with the latest version of the Amazon Braket PennyLane plugin (v1.25.0) and Amazon Braket Python SDK (v1.73.3) (#620) (#672) (#673)

Note that with this update, all declared qubits in a submitted program will always be measured, even if specific qubits were never used.
An updated quantum device specification format, TOML schema v2, is now supported by Catalyst. This allows device authors to specify properties such as native quantum control support, gate invertibility, and differentiability on a per-operation level. (#554)

For more details on the new TOML schema, please refer to the custom devices documentation.
An exception is now raised when OpenBLAS cannot be found by Catalyst during compilation. (#643)

Breaking changes

qml.sample and qml.counts now produce integer arrays for the sample array and basis state array when used without observables. (#648)
The endianness of counts in Catalyst now matches the convention of PennyLane. (#601)
catalyst.debug.print no longer supports the memref keyword argument. Please use catalyst.debug.print_memref instead. (#621)

Bug fixes

The QNode argument diff_method=None is now supported for QNodes within a qjit-compiled function. (#658)
A bug has been fixed where the C++ compiler driver was incorrectly being triggered twice. (#594)
Programs with jnp.reshape no longer fail. (#592)
A bug in the quantum adjoint routine in the compiler has been fixed, which didn’t take into account control wires on operations in all instances. (#591)
A bug in the test suite causing stochastic autograph test failures has been fixed. (#652)
Running Catalyst tests should no longer raise ResourceWarning from the use of tempfile.TemporaryDirectory. (#676)
Raises an exception if the user has an incompatible CUDA Quantum version installed. (#707)

Internal changes

The deprecated @qfunc decorator, in use mainly by the LIT test suite, has been removed. (#679)
Catalyst now publishes a revision string under catalyst.__revision__, in addition to the existing catalyst.__version__ string. The revision contains the Git commit hash of the repository at the time of packaging, or for editable installations the active commit hash at the time of package import. (#560)
The Python interpreter is now a shared resource across the runtime. (#615)

This change allows any part of the runtime to start executing Python code through pybind.

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, David Ittah, Romain Moyard, Sergei Mironov, Erick Ochoa Lopez, Lee James O’Riordan, Muzammiluddin Syed.

Release 0.5.0¶

New features

Catalyst now provides a QJIT compatible catalyst.vmap function, which makes it even easier to modify functions to map over inputs with additional batch dimensions. (#497) (#569)

When working with tensor/array frameworks in Python, it can be important to ensure that code is written to minimize usage of Python for loops (which can be slow and inefficient), and instead push as much of the computation through to the array manipulation library, by taking advantage of extra batch dimensions.

For example, consider the following QNode:
```
dev = qml.device("lightning.qubit", wires=1)

@qml.qnode(dev)
def circuit(x, y):
    qml.RX(jnp.pi * x[0] + y, wires=0)
    qml.RY(x[1] ** 2, wires=0)
    qml.RX(x[1] * x[2], wires=0)
    return qml.expval(qml.PauliZ(0))
```
```
>>> circuit(jnp.array([0.1, 0.2, 0.3]), jnp.pi)
Array(-0.93005586, dtype=float64)
```
We can use catalyst.vmap to introduce additional batch dimensions to our input arguments, without needing to use a Python for loop:
```
>>> x = jnp.array([[0.1, 0.2, 0.3],
...                [0.4, 0.5, 0.6],
...                [0.7, 0.8, 0.9]])
>>> y = jnp.array([jnp.pi, jnp.pi / 2, jnp.pi / 4])
>>> qjit(vmap(cost))(x, y)
array([-0.93005586, -0.97165424, -0.6987465 ])
```
catalyst.vmap() has been implemented to match the same behaviour of jax.vmap, so should be a drop-in replacement in most cases. Under-the-hood, it is automatically inserting Catalyst-compatible for loops, which will be compiled and executed outside of Python for increased performance.
Catalyst now supports compiling and executing QJIT-compiled QNodes using the CUDA Quantum compiler toolchain. (#477) (#536) (#547)

Simply import the CUDA Quantum @cudaqjit decorator to use this functionality:
```
from catalyst.cuda import cudaqjit
```
Or, if using Catalyst from PennyLane, simply specify @qml.qjit(compiler="cuda_quantum").

The following devices are available when compiling with CUDA Quantum:
- softwareq.qpp: a modern C++ state-vector simulator
- nvidia.custatevec: The NVIDIA CuStateVec GPU simulator (with support for multi-gpu)
- nvidia.cutensornet: The NVIDIA CuTensorNet GPU simulator (with support for matrix product state)
For example:
```
dev = qml.device("softwareq.qpp", wires=2)

@cudaqjit
@qml.qnode(dev)
def circuit(x):
    qml.RX(x[0], wires=0)
    qml.RY(x[1], wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliY(0))
```
```
>>> circuit(jnp.array([0.5, 1.4]))
-0.47244976756708373
```
Note that CUDA Quantum compilation currently does not have feature parity with Catalyst compilation; in particular, AutoGraph, control flow, differentiation, and various measurement statistics (such as probabilities and variance) are not yet supported. Classical code support is also limited.

Catalyst now supports just-in-time compilation of static (compile-time constant) arguments. (#476) (#550)

The @qjit decorator takes a new argument static_argnums, which specifies positional arguments of the decorated function should be treated as compile-time static arguments.

This allows any hashable Python object to be passed to the function during compilation; the function will only be re-compiled if the hash value of the static arguments change. Otherwise, re-using previous static argument values will result in no re-compilation.

@qjit(static_argnums=(1,))
def f(x, y):
    print(f"Compiling with y={y}")
    return x + y

>>> f(0.5, 0.3)
Compiling with y=0.3
array(0.8)
>>> f(0.1, 0.3)  # no re-compilation occurs
array(0.4)
>>> f(0.1, 0.4)  # y changes, re-compilation
Compiling with y=0.4
array(0.5)

This functionality can be used to support passing arbitrary Python objects to QJIT-compiled functions, as long as they are hashable:

from dataclasses import dataclass

@dataclass
class MyClass:
    val: int

    def __hash__(self):
        return hash(str(self))

@qjit(static_argnums=(1,))
def f(x: int, y: MyClass):
    return x + y.val

>>> f(1, MyClass(5))
array(6)
>>> f(1, MyClass(6))  # re-compilation
array(7)
>>> f(2, MyClass(5))  # no re-compilation
array(7)

Mid-circuit measurements now support post-selection and qubit reset when used with the Lightning simulators. (#491) (#507)

To specify post-selection, simply pass the postselect argument to the catalyst.measure function:

dev = qml.device("lightning.qubit", wires=1)

@qjit
@qml.qnode(dev)
def f():
    qml.Hadamard(0)
    m = measure(0, postselect=1)
    return qml.expval(qml.PauliZ(0))

Likewise, to reset a wire after mid-circuit measurement, simply specify reset=True:

dev = qml.device("lightning.qubit", wires=1)

@qjit
@qml.qnode(dev)
def f():
    qml.Hadamard(0)
    m = measure(0, reset=True)
    return qml.expval(qml.PauliZ(0))

Improvements

Catalyst now supports Python 3.12 (#532)
The JAX version used by Catalyst has been updated to v0.4.23. (#428)
Catalyst now supports the qml.GlobalPhase operation. (#563)

Native support for qml.PSWAP and qml.ISWAP gates on Amazon Braket devices has been added. (#458)

Specifically, a circuit like

dev = qml.device("braket.local.qubit", wires=2, shots=100)

@qjit
@qml.qnode(dev)
def f(x: float):
    qml.Hadamard(0)
    qml.PSWAP(x, wires=[0, 1])
    qml.ISWAP(wires=[1, 0])
    return qml.probs()

Add support for GlobalPhase gate in the runtime. (#563)

would no longer decompose the PSWAP and ISWAP gates.
The qml.BlockEncode operator is now supported with Catalyst. (#483)
Catalyst no longer relies on a TensorFlow installation for its AutoGraph functionality. Instead, the standalone diastatic-malt package is used and automatically installed as a dependency. (#401)
The @qjit decorator will remember previously compiled functions when the PyTree metadata of arguments changes, in addition to also remembering compiled functions when static arguments change. (#522)

The following example will no longer trigger a third compilation:
```
@qjit
def func(x):
    print("compiling")
    return x
```
```
>>> func([1,]);             # list
compiling
>>> func((2,));             # tuple
compiling
>>> func([3,]);             # list
```
Note however that in order to keep overheads low, changing the argument type or shape (in a promotion incompatible way) may override a previously stored function (with identical PyTree metadata and static argument values):
```
@qjit
def func(x):
    print("compiling")
    return x
```
```
>>> func(jnp.array(1));     # scalar
compiling
>>> func(jnp.array([2.]));  # 1-D array
compiling
>>> func(jnp.array(3));     # scalar
compiling
```

Catalyst gradient functions (grad, jacobian, vjp, and jvp) now support being applied to functions that use (nested) container types as inputs and outputs. This includes lists and dictionaries, as well as any data structure implementing the PyTree protocol. (#500) (#501) (#508) (#549)

dev = qml.device("lightning.qubit", wires=1)

@qml.qnode(dev)
def circuit(phi, psi):
    qml.RY(phi, wires=0)
    qml.RX(psi, wires=0)
    return [{"expval0": qml.expval(qml.PauliZ(0))}, qml.expval(qml.PauliZ(0))]

psi = 0.1
phi = 0.2

>>> qjit(jacobian(circuit, argnum=[0, 1]))(psi, phi)
[{'expval0': (array(-0.0978434), array(-0.19767681))}, (array(-0.0978434), array(-0.19767681))]

Support has been added for linear algebra functions which depend on computing the eigenvalues of symmetric matrices, such as np.sqrt_matrix(). (#488)

For example, you can compile qml.math.sqrt_matrix:
```
@qml.qjit
def workflow(A):
    B = qml.math.sqrt_matrix(A)
    return B @ A
```
Internally, this involves support for lowering the eigenvectors/values computation lapack method lapack_dsyevd via stablehlo.custom_call.
Additional debugging functions are now available in the catalyst.debug directory. (#529) (#522)

This includes:
- filter_static_args(args, static_argnums) to remove static values from arguments using the provided index list.
- get_cmain(fn, *args) to return a C program that calls a jitted function with the provided arguments.
- print_compilation_stage(fn, stage) to print one of the recorded compilation stages for a JIT-compiled function.
For more details, please see the catalyst.debug documentation.
Remove redundant copies of TOML files for lightning.kokkos and lightning.qubit. (#472)

lightning.kokkos and lightning.qubit now ship with their own TOML file. As such, we use the TOML file provided by them.
Capturing quantum circuits with many gates prior to compilation is now quadratically faster (up to a factor), by removing qextract_p and qinst_p from forced-order primitives. (#469)
Update AllocateQubit and AllocateQubits in LightningKokkosSimulator to preserve the current state-vector before qubit re-allocations in the runtime dynamic qubits management. (#479)
The PennyLane custom compiler entry point name convention has changed, necessitating a change to the Catalyst entry points. (#493)

Breaking changes

Catalyst gradient functions now match the Jax convention for the returned axes of gradients, Jacobians, VJPs, and JVPs. As a result, the returned tensor shape from various Catalyst gradient functions may differ compared to previous versions of Catalyst. (#500) (#501) (#508)
The Catalyst Python frontend has been partially refactored. The impact on user-facing functionality is minimal, but the location of certain classes and methods used by the package may have changed. (#529) (#522)

The following changes have been made:
- Some debug methods and features on the QJIT class have been turned into free functions and moved to the catalyst.debug module, which will now appear in the public documention. This includes compiling a program from IR, obtaining a C program to invoke a compiled function from, and printing fine-grained MLIR compilation stages.
- The compilation_pipelines.py module has been renamed to jit.py, and certain functionality has been moved out (see following items).
- A new module compiled_functions.py now manages low-level access to compiled functions.
- A new module tracing/type_signatures.py handles functionality related managing arguments and type signatures during the tracing process.
- The contexts.py module has been moved from utils to the new tracing sub-module.

Internal changes

Changes to the runtime QIR API and dependencies, to avoid symbol conflicts with other libraries that utilize QIR. (#464) (#470)

The existing Catalyst runtime implements QIR as a library that can be linked against a QIR module. This works great when Catalyst is the only implementor of QIR, however it may generate symbol conflicts when used alongside other QIR implementations.

To avoid this, two changes were necessary:
- The Catalyst runtime now has a different API from QIR instructions.
  
  The runtime has been modified such that QIR instructions are lowered to functions where the __quantum__ part of the function name is replaced with __catalyst__. This prevents the possibility of symbol conflicts with other libraries that implement QIR as a library.
- The Catalyst runtime no longer depends on QIR runner’s stdlib.
  
  We no longer depend nor link against QIR runner’s stdlib. By linking against QIR runner’s stdlib, some definitions persisted that may be different than ones used by third party implementors. To prevent symbol conflicts QIR runner’s stdlib was removed and is no longer linked against. As a result, the following functions are now defined and implemented in Catalyst’s runtime:
  - int64_t __catalyst__rt__array_get_size_1d(QirArray *)
  - int8_t *__catalyst__rt__array_get_element_ptr_1d(QirArray *, int64_t)
  and the following functions were removed since the frontend does not generate them
  - QirString *__catalyst__rt__qubit_to_string(QUBIT *)
  - QirString *__catalyst__rt__result_to_string(RESULT *)
Fix an issue when no qubit number was specified for the qinst primitive. The primitive now correctly deduces the number of qubits when no gate parameters are present. This change is not user facing. (#496)

Bug fixes

Fixed a bug where differentiation of sliced arrays would result in an error. (#552)

def f(x):
  return jax.numpy.sum(x[::2])

x = jax.numpy.array([0.1, 0.2, 0.3, 0.4])

>>> catalyst.qjit(catalyst.grad(f))(x)
[1. 0. 1. 0.]

Fixed a bug where quantum control applied to a subcircuit was not correctly mapping wires, and the wires in the nested region remained unchanged. (#555)
Catalyst will no longer print a warning that recompilation is triggered when a @qjit decorated function with no arguments is invoke without having been compiled first, for example via the use of target="mlir". (#522)
Fixes a bug in the configuration of dynamic shaped arrays that would cause certain program to error with TypeError: cannot unpack non-iterable ShapedArray object. (#526)

This is fixed by replacing the code which updates the JAX_DYNAMIC_SHAPES option with a transient_jax_config() context manager which temporarily sets the value of JAX_DYNAMIC_SHAPES to True and then restores the original configuration value following the yield. The context manager is used by trace_to_jaxpr() and lower_jaxpr_to_mlir().
Exceptions encountered in the runtime when using the @qjit option async_qnodes=Tue will now be properly propagated to the frontend. (#447) (#510)

This is done by:
- changeing llvm.call to llvm.invoke
- setting async runtime tokens and values to be errors
- deallocating live tokens and values
Fixes a bug when computing gradients with the indexing/slicing, by fixing the scatter operation lowering when updatedWindowsDim is empty. (#475)
Fix the issue in LightningKokkos::AllocateQubits with allocating too many qubit IDs on qubit re-allocation. (#473)
Fixed an issue where wires was incorrectly set as <Wires = [<WiresEnum.AnyWires: -1>]> when using catalyst.adjoint and catalyst.ctrl, by adding a wires property to these operations. (#480)
Fix the issue with multiple lapack symbol definitions in the compiled program by updating the stablehlo.custom_call conversion pass. (#488)

Contributors

This release contains contributions from (in alphabetical order):

Mikhail Andrenkov, Ali Asadi, David Ittah, Tzung-Han Juang, Erick Ochoa Lopez, Romain Moyard, Raul Torres, Haochen Paul Wang.

Release 0.4.1¶

Improvements

Catalyst wheels are now packaged with OpenMP and ZStd, which avoids installing additional requirements separately in order to use pre-packaged Catalyst binaries. (#457) (#478)

Note that OpenMP support for the lightning.kokkos backend has been disabled on macOS x86_64, due to memory issues in the computation of Lightning’s adjoint-jacobian in the presence of multiple OMP threads.

Bug fixes

Resolve an infinite recursion in the decomposition of the Controlled operator whenever computing a Unitary matrix for the operator fails. (#468)
Resolve a failure to generate gradient code for specific input circuits. (#439)

In this case, jnp.mod was used to compute wire values in a for loop, which prevented the gradient architecture from fully separating quantum and classical code. The following program is now supported:
```
@qjit
@grad
@qml.qnode(dev)
def f(x):
    def cnot_loop(j):
        qml.CNOT(wires=[j, jnp.mod((j + 1), 4)])

    for_loop(0, 4, 1)(cnot_loop)()

    return qml.expval(qml.PauliZ(0))
```
Resolve unpredictable behaviour when importing libraries that share Catalyst’s LLVM dependency (e.g. TensorFlow). In some cases, both packages exporting the same symbols from their shared libraries can lead to process crashes and other unpredictable behaviour, since the wrong functions can be called if both libraries are loaded in the current process. The fix involves building shared libraries with hidden (macOS) or protected (linux) symbol visibility by default, exporting only what is necessary. (#465)
Resolve a failure to find the SciPy OpenBLAS library when running Catalyst, due to a different SciPy version being used to build Catalyst than to run it. (#471)
Resolve a memory leak in the runtime stemming from missing calls to device destructors at the end of programs. (#446)

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, David Ittah.

Release 0.4.0¶

New features

Catalyst is now accessible directly within the PennyLane user interface, once Catalyst is installed, allowing easy access to Catalyst just-in-time functionality.

Through the use of the qml.qjit decorator, entire workflows can be JIT compiled down to a machine binary on first-function execution, including both quantum and classical processing. Subsequent calls to the compiled function will execute the previously-compiled binary, resulting in significant performance improvements.
```
import pennylane as qml

dev = qml.device("lightning.qubit", wires=2)

@qml.qjit
@qml.qnode(dev)
def circuit(theta):
    qml.Hadamard(wires=0)
    qml.RX(theta, wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(wires=1))
```
```
>>> circuit(0.5)  # the first call, compilation occurs here
array(0.)
>>> circuit(0.5)  # the precompiled quantum function is called
array(0.)
```
Currently, PennyLane supports the Catalyst hybrid compiler with the qml.qjit decorator, which directly aliases Catalyst’s catalyst.qjit.

In addition to the above qml.qjit integration, the following native PennyLane functions can now be used with the qjit decorator: qml.adjoint, qml.ctrl, qml.grad, qml.jacobian, qml.vjp, qml.jvp, and qml.adjoint, qml.while_loop, qml.for_loop, qml.cond. These will alias to the corresponding Catalyst functions when used within a qjit context.

For more details on these functions, please refer to the PennyLane compiler documentation and compiler module documentation.
Just-in-time compiled functions now support asynchronuous execution of QNodes. (#374) (#381) (#420) (#424) (#433)

Simply specify async_qnodes=True when using the @qjit decorator to enable the async execution of QNodes. Currently, asynchronous execution is only supported by lightning.qubit and lightning.kokkos.

Asynchronous execution will be most beneficial for just-in-time compiled functions that contain — or generate — multiple QNodes.

For example,
```
dev = qml.device("lightning.qubit", wires=2)

@qml.qnode(device=dev)
def circuit(params):
    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(wires=0))

@qjit(async_qnodes=True)
def multiple_qnodes(params):
    x = jnp.sin(params)
    y = jnp.cos(params)
    z = jnp.array([circuit(x), circuit(y)]) # will be executed in parallel
    return circuit(z)
```
```
>>> func(jnp.array([1.0, 2.0]))
1.0
```
Here, the first two circuit executions will occur in parallel across multiple threads, as their execution can occur indepdently.
Preliminary support for PennyLane transforms has been added. (#280)
```
@qjit
@qml.transforms.split_non_commuting
@qml.qnode(dev)
def circuit(x):
    qml.RX(x,wires=0)
    return [qml.expval(qml.PauliY(0)), qml.expval(qml.PauliZ(0))]
```
```
>>> circuit(0.4)
[array(-0.51413599), array(0.85770868)]
```
Currently, most PennyLane transforms will work with Catalyst as long as:
- The circuit does not include any Catalyst-specific features, such as Catalyst control flow or measurement,
- The QNode returns only lists of measurement processes,
- AutoGraph is disabled, and
- The transformation does not require or depend on the numeric value of dynamic variables.
Catalyst now supports just-in-time compilation of dynamically-shaped arrays. (#366) (#386) (#390) (#411)

The @qjit decorator can now be used to compile functions that accepts or contain tensors whose dimensions are not known at compile time; runtime execution with different shapes is supported without recompilation.

In addition, standard tensor initialization functions jax.numpy.ones, jnp.zeros, and jnp.empty now accept dynamic variables (where the value is only known at runtime).
```
@qjit
def func(size: int):
    return jax.numpy.ones([size, size], dtype=float)
```
```
>>> func(3)
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
```
When passing tensors as arguments to compiled functions, the abstracted_axes keyword argument to the @qjit decorator can be used to specify which axes of the input arguments should be treated as abstract (and thus avoid recompilation).

For example, without specifying abstracted_axes, the following sum function would recompile each time an array of different size is passed as an argument:
```
>>> @qjit
>>> def sum_fn(x):
>>>     return jnp.sum(x)
>>> sum_fn(jnp.array([1]))     # Compilation happens here.
>>> sum_fn(jnp.array([1, 1]))  # And here!
```
By passing abstracted_axes, we can specify that the first axes of the first argument is to be treated as dynamic during initial compilation:
```
>>> @qjit(abstracted_axes={0: "n"})
>>> def sum_fn(x):
>>>     return jnp.sum(x)
>>> sum_fn(jnp.array([1]))     # Compilation happens here.
>>> sum_fn(jnp.array([1, 1]))  # No need to recompile.
```
Note that support for dynamic arrays in control-flow primitives (such as loops), is not yet supported.

Error mitigation using the zero-noise extrapolation method is now available through the catalyst.mitigate_with_zne transform. (#324) (#414)

For example, given a noisy device (such as noisy hardware available through Amazon Braket):

dev = qml.device("noisy.device", wires=2)

@qml.qnode(device=dev)
def circuit(x, n):

    @for_loop(0, n, 1)
    def loop_rx(i):
        qml.RX(x, wires=0)

    loop_rx()

    qml.Hadamard(wires=0)
    qml.RZ(x, wires=0)
    loop_rx()
    qml.RZ(x, wires=0)
    qml.CNOT(wires=[1, 0])
    qml.Hadamard(wires=1)
    return qml.expval(qml.PauliY(wires=0))

@qjit
def mitigated_circuit(args, n):
    s = jax.numpy.array([1, 2, 3])
    return mitigate_with_zne(circuit, scale_factors=s)(args, n)

>>> mitigated_circuit(0.2, 5)
0.5655341100116512

In addition, a mitigation dialect has been added to the MLIR layer of Catalyst. It contains a Zero Noise Extrapolation (ZNE) operation, with a lowering to a global folded circuit.

Improvements

The three backend devices provided with Catalyst, lightning.qubit, lightning.kokkos, and braket.aws, are now dynamically loaded at runtime. (#343) (#400)

This takes advantage of the new backend plugin system provided in Catalyst v0.3.2, and allows the devices to be packaged separately from the runtime CAPI. Provided backend devices are now loaded at runtime, instead of being linked at compile time.

For more details on the backend plugin system, see the custom devices documentation.

Finite-shot measurement statistics (expval, var, and probs) are now supported for the lightning.qubit and lightning.kokkos devices. Previously, exact statistics were returned even when finite shots were specified. (#392) (#410)

>>> dev = qml.device("lightning.qubit", wires=2, shots=100)
>>> @qjit
>>> @qml.qnode(dev)
>>> def circuit(x):
>>>     qml.RX(x, wires=0)
>>>     return qml.probs(wires=0)
>>> circuit(0.54)
array([0.94, 0.06])
>>> circuit(0.54)
array([0.93, 0.07])

Catalyst gradient functions grad, jacobian, jvp, and vjp can now be invoked from outside a @qjit context. (#375)

This simplifies the process of writing functions where compilation can be turned on and off easily by adding or removing the decorator. The functions dispatch to their JAX equivalents when the compilation is turned off.

dev = qml.device("lightning.qubit", wires=2)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    return qml.expval(qml.PauliZ(0))

>>> grad(circuit)(0.54)  # dispatches to jax.grad
Array(-0.51413599, dtype=float64, weak_type=True)
>>> qjit(grad(circuit))(0.54). # differentiates using Catalyst
array(-0.51413599)

New lightning.qubit configuration options are now supported via the qml.device loader, including Markov Chain Monte Carlo sampling support. (#369)

dev = qml.device("lightning.qubit", wires=2, shots=1000, mcmc=True)

@qml.qnode(dev)
def circuit(x):
    qml.RX(x, wires=0)
    return qml.expval(qml.PauliZ(0))

>>> circuit(0.54)
array(0.856)

Improvements have been made to the runtime and quantum MLIR dialect in order to support asynchronous execution.
- The runtime now supports multiple active devices managed via a device pool. The new RTDevice data-class and RTDeviceStatus along with the thread_local device instance pointer enable the runtime to better scope the lifetime of device instances concurrently. With these changes, one can create multiple active devices and execute multiple programs in a multithreaded environment. (#381)
- The ability to dynamically release devices has been added via DeviceReleaseOp in the Quantum MLIR dialect. This is lowered to the __quantum__rt__device_release() runtime instruction, which updates the status of the device instance from Active to Inactive. The runtime will reuse this deactivated instance instead of creating a new one automatically at runtime in a multi-QNode workflow when another device with identical specifications is requested. (#381)
- The DeviceOp definition in the Quantum MLIR dialect has been updated to lower a tuple of device information ('lib', 'name', 'kwargs') to a single device initialization call __quantum__rt__device_init(int8_t *, int8_t *, int8_t *). This allows the runtime to initialize device instances without keeping partial information of the device (#396)
The quantum adjoint compiler routine has been extended to support function calls that affect the quantum state within an adjoint region. Note that the function may only provide a single result consisting of the quantum register. By itself this provides no user-facing changes, but compiler pass developers may now generate quantum adjoint operations around a block of code containing function calls as well as quantum operations and control flow operations. (#353)
The allocation and deallocation operations in MLIR (AllocOp, DeallocOp) now follow simple value semantics for qubit register values, instead of modelling memory in the MLIR trait system. Similarly, the frontend generates proper value semantics by deallocating the final register value.

The change enables functions at the MLIR level to accept and return quantum register values, which would otherwise not be correctly identified as aliases of existing register values by the bufferization system. (#360)

Breaking changes

Third party devices must now provide a configuration TOML file, in order to specify their supported operations, measurements, and features for Catalyst compatibility. For more information please visit the Custom Devices section in our documentation. (#369)

Bug fixes

Resolves a bug in the compiler’s differentiation engine that results in a segmentation fault when attempting to differentiate non-differentiable quantum operations. The fix ensures that all existing quantum operation types are removed during gradient passes that extract classical code from a QNode function. It also adds a verification step that will raise an error if a gradient pass cannot successfully eliminate all quantum operations for such functions. (#397)
Resolves a bug that caused unpredictable behaviour when printing string values with the debug.print function. The issue was caused by non-null-terminated strings. (#418)

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, David Ittah, Romain Moyard, Sergei Mironov, Erick Ochoa Lopez, Shuli Shu.

Release 0.3.2¶

New features

The experimental AutoGraph feature now supports Python while loops, allowing native Python loops to be captured and compiled with Catalyst. (#318)
```
dev = qml.device("lightning.qubit", wires=4)

@qjit(autograph=True)
@qml.qnode(dev)
def circuit(n: int, x: float):
    i = 0

    while i < n:
        qml.RX(x, wires=i)
        i += 1

    return qml.expval(qml.PauliZ(0))
```
```
>>> circuit(4, 0.32)
array(0.94923542)
```
This feature extends the existing AutoGraph support for Python for loops and if statements introduced in v0.3. Note that TensorFlow must be installed for AutoGraph support.

For more details, please see the AutoGraph guide.
In addition to loops and conditional branches, AutoGraph now supports native Python and, or and not operators in Boolean expressions. (#325)
```
dev = qml.device("lightning.qubit", wires=1)

@qjit(autograph=True)
@qml.qnode(dev)
def circuit(x: float):

    if x >= 0 and x < jnp.pi:
        qml.RX(x, wires=0)

    return qml.probs()
```
```
>>> circuit(0.43)
array([0.95448287, 0.04551713])
>>> circuit(4.54)
array([1., 0.])
```
Note that logical Boolean operators will only be captured by AutoGraph if all operands are dynamic variables (that is, a value known only at runtime, such as a measurement result or function argument). For other use cases, it is recommended to use the jax.numpy.logical_* set of functions where appropriate.
Debug compiled programs and print dynamic values at runtime with debug.print (#279) (#356)

You can now print arbitrary values from your running program, whether they are arrays, constants, strings, or abitrary Python objects. Note that while non-array Python objects will be printed at runtime, their string representation is captured at compile time, and thus will always be the same regardless of program inputs. The output for arrays optionally includes a descriptor for how the data is stored in memory (“memref”).
```
@qjit
def func(x: float):
    debug.print(x, memref=True)
    debug.print("exit")
```
```
>>> func(jnp.array(0.43))
MemRef: base@ = 0x5629ff2b6680 rank = 0 offset = 0 sizes = [] strides = [] data =
0.43
exit
```
Catalyst now officially supports macOS X86_64 devices, with macOS binary wheels available for both AARCH64 and X86_64. (#347) (#313)
It is now possible to dynamically load third-party Catalyst compatible devices directly into a pre-installed Catalyst runtime on Linux. (#327)

To take advantage of this, third-party devices must implement the Catalyst::Runtime::QuantumDevice interface, in addition to defining the following method:
```
extern "C" Catalyst::Runtime::QuantumDevice*
getCustomDevice() { return new CustomDevice(); }
```
This support can also be integrated into existing PennyLane Python devices that inherit from the QuantumDevice class, by defining the get_c_interface static method.

For more details, see the custom devices documentation.

Improvements

Return values of conditional functions no longer need to be of exactly the same type. Type promotion is automatically applied to branch return values if their types don’t match. (#333)

@qjit
def func(i: int, f: float):

    @cond(i < 3)
    def cond_fn():
        return i

    @cond_fn.otherwise
    def otherwise():
        return f

    return cond_fn()

>>> func(1, 4.0)
array(1.0)

Automatic type promotion across conditional branches also works with AutoGraph:

@qjit(autograph=True)
def func(i: int, f: float):

    if i < 3:
        i = i
    else:
        i = f

    return i

>>> func(1, 4.0)
array(1.0)

AutoGraph now supports converting functions even when they are invoked through functional wrappers such as adjoint, ctrl, grad, jacobian, etc. (#336)

For example, the following should now succeed:
```
def inner(n):
  for i in range(n):
    qml.T(i)

@qjit(autograph=True)
@qml.qnode(dev)
def f(n: int):
    adjoint(inner)(n)
    return qml.state()
```
To prepare for Catalyst’s frontend being integrated with PennyLane, the appropriate plugin entry point interface has been added to Catalyst. (#331)

For any compiler packages seeking to be registered in PennyLane, the entry_points metadata under the the group name pennylane.compilers must be added, with the following entry points:
- context: Path to the compilation evaluation context manager. This context manager should have the method context.is_tracing(), which returns True if called within a program that is being traced or captured.
- ops: Path to the compiler operations module. This operations module may contain compiler specific versions of PennyLane operations. Within a JIT context, PennyLane operations may dispatch to these.
- qjit: Path to the JIT compiler decorator provided by the compiler. This decorator should have the signature qjit(fn, *args, **kwargs), where fn is the function to be compiled.
The compiler driver diagnostic output has been improved, and now includes failing IR as well as the names of failing passes. (#349)
The scatter operation in the Catalyst dialect now uses an SCF for loop to avoid ballooning the compiled code. (#307)
The CopyGlobalMemRefPass pass of our MLIR processing pipeline now supports dynamically shaped arrays. (#348)
The Catalyst utility dialect is now included in the Catalyst MLIR C-API. (#345)
Fix an issue with the AutoGraph conversion system that would prevent the fallback to Python from working correctly in certain instances. (#352)

The following type of code is now supported:
```
@qjit(autograph=True)
def f():
  l = jnp.array([1, 2])
  for _ in range(2):
      l = jnp.kron(l, l)
  return l
```
Catalyst now supports jax.numpy.polyfit inside a qjitted function. (#367)
Catalyst now supports custom calls (including the one from HLO). We added support in MLIR (operation, bufferization and lowering). In the lib_custom_calls, developers then implement their custom calls and use external functions directly (e.g. Lapack). The OpenBlas library is taken from Scipy and linked in Catalyst, therefore any function from it can be used. (#367)

Breaking changes

The axis ordering for catalyst.jacobian is updated to match jax.jacobian. Assuming we have parameters of shape [a,b] and results of shape [c,d], the returned Jacobian will now have shape [c, d, a, b] instead of [a, b, c, d]. (#283)

Bug fixes

An upstream change in the PennyLane-Lightning project was addressed to prevent compilation issues in the StateVectorLQubitDynamic class in the runtime. The issue was introduced in #499. (#322)
The requirements.txt file to build Catalyst from source has been updated with a minimum pip version, >=22.3. Previous versions of pip are unable to perform editable installs when the system-wide site-packages are read-only, even when the --user flag is provided. (#311)
The frontend has been updated to make it compatible with PennyLane MeasurementProcess objects now being PyTrees in PennyLane version 0.33. (#315)

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, David Ittah, Sergei Mironov, Romain Moyard, Erick Ochoa Lopez.

Release 0.3.1¶

New features

The experimental AutoGraph feature, now supports Python for loops, allowing native Python loops to be captured and compiled with Catalyst. (#258)
```
dev = qml.device("lightning.qubit", wires=n)

@qjit(autograph=True)
@qml.qnode(dev)
def f(n):
    for i in range(n):
        qml.Hadamard(wires=i)

    return qml.expval(qml.PauliZ(0))
```
This feature extends the existing AutoGraph support for Python if statements introduced in v0.3. Note that TensorFlow must be installed for AutoGraph support.
The quantum control operation can now be used in conjunction with Catalyst control flow, such as loops and conditionals, via the new catalyst.ctrl function. (#282)

Similar in behaviour to the qml.ctrl control modifier from PennyLane, catalyst.ctrl can additionally wrap around quantum functions which contain control flow, such as the Catalyst cond, for_loop, and while_loop primitives.
```
@qjit
@qml.qnode(qml.device("lightning.qubit", wires=4))
def circuit(x):

    @for_loop(0, 3, 1)
    def repeat_rx(i):
        qml.RX(x / 2, wires=i)

    catalyst.ctrl(repeat_rx, control=3)()

    return qml.expval(qml.PauliZ(0))
```
```
>>> circuit(0.2)
array(1.)
```

Catalyst now supports JAX’s array.at[index] notation for array element assignment and updating. (#273)

@qjit
def add_multiply(l: jax.core.ShapedArray((3,), dtype=float), idx: int):
    res = l.at[idx].multiply(3)
    res2 = l.at[idx].add(2)
    return res + res2

res = add_multiply(jnp.array([0, 1, 2]), 2)

>>> res
[0, 2, 10]

For more details on available methods, see the JAX documentation.

Improvements

The Lightning backend device has been updated to work with the new PL-Lightning monorepo. (#259) (#277)
A new compiler driver has been implemented in C++. This improves compile-time performance by avoiding round-tripping, which is when the entire program being compiled is dumped to a textual form and re-parsed by another tool.

This is also a requirement for providing custom metadata at the LLVM level, which is necessary for better integration with tools like Enzyme. Finally, this makes it more natural to improve error messages originating from C++ when compared to the prior subprocess-based approach. (#216)
Support the braket.devices.Devices enum class and s3_destination_folder device options for AWS Braket remote devices. (#278)
Improvements have been made to the build process, including avoiding unnecessary processes such as removing opt and downloading the wheel. (#298)
Remove a linker warning about duplicate rpaths when Catalyst wheels are installed on macOS. (#314)

Bug fixes

Fix incompatibilities with GCC on Linux introduced in v0.3.0 when compiling user programs. Due to these, Catalyst v0.3.0 only works when clang is installed in the user environment.
- Resolve an issue with an empty linker flag, causing ld to error. (#276)
- Resolve an issue with undefined symbols provided the Catalyst runtime. (#316)
Remove undocumented package dependency on the zlib/zstd compression library. (#308)
Fix filesystem issue when compiling multiple functions with the same name and keep_intermediate=True. (#306)
Add support for applying the adjoint operation to QubitUnitary gates. QubitUnitary was not able to be adjointed when the variable holding the unitary matrix might change. This can happen, for instance, inside of a for loop. To solve this issue, the unitary matrix gets stored in the array list via push and pops. The unitary matrix is later reconstructed from the array list and QubitUnitary can be executed in the adjointed context. (#304) (#310)

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, David Ittah, Erick Ochoa Lopez, Jacob Mai Peng, Sergei Mironov, Romain Moyard.

Release 0.3.0¶

New features

Catalyst now officially supports macOS ARM devices, such as Apple M1/M2 machines, with macOS binary wheels available on PyPI. For more details on the changes involved to support macOS, please see the improvements section. (#229) (#232) (#233) (#234)
Write Catalyst-compatible programs with native Python conditional statements. (#235)

AutoGraph is a new, experimental, feature that automatically converts Python conditional statements like if, else, and elif, into their equivalent functional forms provided by Catalyst (such as catalyst.cond).

This feature is currently opt-in, and requires setting the autograph=True flag in the qjit decorator:
```
dev = qml.device("lightning.qubit", wires=1)

@qjit(autograph=True)
@qml.qnode(dev)
def f(x):
    if x < 0.5:
        qml.RY(jnp.sin(x), wires=0)
    else:
        qml.RX(jnp.cos(x), wires=0)

    return qml.expval(qml.PauliZ(0))
```
The implementation is based on the AutoGraph module from TensorFlow, and requires a working TensorFlow installation be available. In addition, Python loops (for and while) are not yet supported, and do not work in AutoGraph mode.

Note that there are some caveats when using this feature especially around the use of global variables or object mutation inside of methods. A functional style is always recommended when using qjit or AutoGraph.
The quantum adjoint operation can now be used in conjunction with Catalyst control flow, such as loops and conditionals. For this purpose a new instruction, catalyst.adjoint, has been added. (#220)

catalyst.adjoint can wrap around quantum functions which contain the Catalyst cond, for_loop, and while_loop primitives. Previously, the usage of qml.adjoint on functions with these primitives would result in decomposition errors. Note that a future release of Catalyst will merge the behaviour of catalyst.adjoint into qml.adjoint for convenience.
```
dev = qml.device("lightning.qubit", wires=3)

@qjit
@qml.qnode(dev)
def circuit(x):

    @for_loop(0, 3, 1)
    def repeat_rx(i):
        qml.RX(x / 2, wires=i)

    adjoint(repeat_rx)()

    return qml.expval(qml.PauliZ(0))
```
```
>>> circuit(0.2)
array(0.99500417)
```
Additionally, the ability to natively represent the adjoint construct in Catalyst’s program representation (IR) was added.
QJIT-compiled programs now support (nested) container types as inputs and outputs of compiled functions. This includes lists and dictionaries, as well as any data structure implementing the PyTree protocol. (#215) (#221)

For example, a program that accepts and returns a mix of dictionaries, lists, and tuples:
```
@qjit
def workflow(params1, params2):
    res1 = params1["a"][0][0] + params2[1]
    return {"y1": jnp.sin(res1), "y2": jnp.cos(res1)}
```
```
>>> params1 = {"a": [[0.1], 0.2]}
>>> params2 = (0.6, 0.8)
>>> workflow(params1, params2)
array(0.78332691)
```

Compile-time backpropagation of arbitrary hybrid programs is now supported, via integration with Enzyme AD. (#158) (#193) (#224) (#225) (#239) (#244)

This allows catalyst.grad to differentiate hybrid functions that contain both classical pre-processing (inside & outside of QNodes), QNodes, as well as classical post-processing (outside of QNodes) via a combination of backpropagation and quantum gradient methods.

The new default for the differentiation method attribute in catalyst.grad has been changed to "auto", which performs Enzyme-based reverse mode AD on classical code, in conjunction with the quantum diff_method specified on each QNode:

dev = qml.device("lightning.qubit", wires=1)

@qml.qnode(dev, diff_method="parameter-shift")
def circuit(theta):
    qml.RX(jnp.exp(theta ** 2) / jnp.cos(theta / 4), wires=0)
    return qml.expval(qml.PauliZ(wires=0))

>>> grad = qjit(catalyst.grad(circuit, method="auto"))
>>> grad(jnp.pi)
array(0.05938718)

The reworked differentiation pipeline means you can now compute exact derivatives of programs with both classical pre- and post-processing, as shown below:

@qml.qnode(qml.device("lightning.qubit", wires=1), diff_method="adjoint")
def circuit(theta):
    qml.RX(jnp.exp(theta ** 2) / jnp.cos(theta / 4), wires=0)
    return qml.expval(qml.PauliZ(wires=0))

def loss(theta):
    return jnp.pi / jnp.tanh(circuit(theta))

@qjit
def grad_loss(theta):
    return catalyst.grad(loss)(theta)

>>> grad_loss(1.0)
array(-1.90958669)

You can also use multiple QNodes with different differentiation methods:

@qml.qnode(qml.device("lightning.qubit", wires=1), diff_method="parameter-shift")
def circuit_A(params):
    qml.RX(jnp.exp(params[0] ** 2) / jnp.cos(params[1] / 4), wires=0)
    return qml.probs()

@qml.qnode(qml.device("lightning.qubit", wires=1), diff_method="adjoint")
def circuit_B(params):
    qml.RX(jnp.exp(params[1] ** 2) / jnp.cos(params[0] / 4), wires=0)
    return qml.expval(qml.PauliZ(wires=0))

def loss(params):
    return jnp.prod(circuit_A(params)) + circuit_B(params)

@qjit
def grad_loss(theta):
    return catalyst.grad(loss)(theta)

>>> grad_loss(jnp.array([1.0, 2.0]))
array([ 0.57367285, 44.4911605 ])

And you can differentiate purely classical functions as well:

def square(x: float):
    return x ** 2

@qjit
def dsquare(x: float):
    return catalyst.grad(square)(x)

>>> dsquare(2.3)
array(4.6)

Note that the current implementation of reverse mode AD is restricted to 1st order derivatives, but you can still use catalyst.grad(method="fd") is still available to perform a finite differences approximation of any differentiable function.

Add support for the new PennyLane arithmetic operators. (#250)

PennyLane is in the process of replacing Hamiltonian and Tensor observables with a set of general arithmetic operators. These consist of Prod, Sum and SProd.

By default, using dunder methods (eg. +, -, @, *) to combine operators with scalars or other operators will create Hamiltonian and Tensor objects. However, these two methods will be deprecated in coming releases of PennyLane.

To enable the new arithmetic operators, one can use Prod, Sum, and Sprod directly or activate them by calling enable_new_opmath at the beginning of your PennyLane program.
```
dev = qml.device("lightning.qubit", wires=2)

@qjit
@qml.qnode(dev)
def circuit(x: float, y: float):
    qml.RX(x, wires=0)
    qml.RX(y, wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(0.2 * qml.PauliX(wires=0) - 0.4 * qml.PauliY(wires=1))
```
```
>>> qml.operation.enable_new_opmath()
>>> qml.operation.active_new_opmath()
True
>>> circuit(np.pi / 4, np.pi / 2)
array(0.28284271)
```

Improvements

Better support for Hamiltonian observables:

Allow Hamiltonian observables with integer coefficients. (#248)

For example, compiling the following circuit wasn’t previously allowed, but is now supported in Catalyst:

dev = qml.device("lightning.qubit", wires=2)

@qjit
@qml.qnode(dev)
def circuit(x: float, y: float):
    qml.RX(x, wires=0)
    qml.RY(y, wires=1)

    coeffs = [1, 2]
    obs = [qml.PauliZ(0), qml.PauliZ(1)]
    return qml.expval(qml.Hamiltonian(coeffs, obs))

Allow nested Hamiltonian observables. (#255)

@qjit
@qml.qnode(qml.device("lightning.qubit", wires=3))
def circuit(x, y, coeffs1, coeffs2):
    qml.RX(x, wires=0)
    qml.RX(y, wires=1)
    qml.RY(x + y, wires=2)

    obs = [
        qml.PauliX(0) @ qml.PauliZ(1),
        qml.Hamiltonian(coeffs1, [qml.PauliZ(0) @ qml.Hadamard(2)]),
    ]

    return qml.var(qml.Hamiltonian(coeffs2, obs))

Various performance improvements:
- The execution and compile time of programs has been reduced, by generating more efficient code and avoiding unnecessary optimizations. Specifically, a scalarization procedure was added to the MLIR pass pipeline, and LLVM IR compilation is now invoked with optimization level 0. (#217)
- The execution time of compiled functions has been improved in the frontend. (#213)
  
  Specifically, the following changes have been made, which leads to a small but measurable improvement when using larger matrices as inputs, or functions with many inputs:
  - only loading the user program library once per compilation,
  - generating return value types only once per compilation,
  - avoiding unnecessary type promotion, and
  - avoiding unnecessary array copies.
- Peak memory utilization of a JIT compiled program has been reduced, by allowing tensors to be scheduled for deallocation. Previously, the tensors were not deallocated until the end of the call to the JIT compiled function. (#201)
Various improvements have been made to enable Catalyst to compile on macOS:
- Remove unnecessary reinterpret_cast from ObsManager. Removal of these reinterpret_cast allows compilation of the runtime to succeed in macOS. macOS uses an ILP32 mode for Aarch64 where they use the full 64 bit mode but with 32 bit Integer, Long, and Pointers. This patch also changes a test file to prevent a mismatch in machines which compile using ILP32 mode. (#229)
- Allow runtime to be compiled on macOS. Substitute nproc with a call to os.cpu_count() and use correct flags for ld.64. (#232)
- Improve portability on the frontend to be available on macOS. Use .dylib, remove unnecessary flags, and address behaviour difference in flags. (#233)
- Small compatibility changes in order for all integration tests to succeed on macOS. (#234)
Dialects can compile with older versions of clang by avoiding type mismatches. (#228)
The runtime is now built against qir-stdlib pre-build artifacts. (#236)
Small improvements have been made to the CI/CD, including fixing the Enzyme cache, generalize caches to other operating systems, fix build wheel recipe, and remove references to QIR in runtime’s Makefile. (#243) (#247)

Breaking changes

Support for Python 3.8 has been removed. (#231)
The default differentiation method on grad and jacobian is reverse-mode automatic differentiation instead of finite differences. When a QNode does not have a diff_method specified, it will default to using the parameter shift method instead of finite-differences. (#244) (#271)
The JAX version used by Catalyst has been updated to v0.4.14, the minimum PennyLane version required is now v0.32. (#264)
Due to the change allowing Python container objects as inputs to QJIT-compiled functions, Python lists are no longer automatically converted to JAX arrays. (#231)

This means that indexing on lists when the index is not static will cause a TracerIntegerConversionError, consistent with JAX’s behaviour.

That is, the following example is no longer support:
```
@qjit
def f(x: list, index: int):
    return x[index]
```
However, if the parameter x above is a JAX or NumPy array, the compilation will continue to succeed.
The catalyst.grad function has been renamed to catalyst.jacobian and supports differentiation of functions that return multiple or non-scalar outputs. A new catalyst.grad function has been added that enforces that it is differentiating a function with a single scalar return value. (#254)

Bug fixes

Fixed an issue preventing the differentiation of qml.probs with the parameter-shift method. (#211)
Fixed the incorrect return value data-type with functions returning qml.counts. (#221)
Fix segmentation fault when differentiating a function where a quantum measurement is used multiple times by the same operation. (#242)

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, David Ittah, Erick Ochoa Lopez, Jacob Mai Peng, Romain Moyard, Sergei Mironov.

Release 0.2.1¶

Bug fixes

Add missing OpenQASM backend in binary distribution, which relies on the latest version of the AWS Braket plugin for PennyLane to resolve dependency issues between the plugin, Catalyst, and PennyLane. The Lightning-Kokkos backend with Serial and OpenMP modes is also added to the binary distribution. #198
Return a list of decompositions when calling the decomposition method for control operations. This allows Catalyst to be compatible with upstream PennyLane. #241

Improvements

When using OpenQASM-based devices the string representation of the circuit is printed on exception. #199
Use pybind11::module interface library instead of pybind11::embed in the runtime for OpenQasm backend to avoid linking to the python library at compile time. #200

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, David Ittah.

Release 0.2.0¶

New features

Catalyst programs can now be used inside of a larger JAX workflow which uses JIT compilation, automatic differentiation, and other JAX transforms. #96 #123 #167 #192

For example, call a Catalyst qjit-compiled function from within a JAX jit-compiled function:

dev = qml.device("lightning.qubit", wires=1)

@qjit
@qml.qnode(dev)
def circuit(x):
    qml.RX(jnp.pi * x[0], wires=0)
    qml.RY(x[1] ** 2, wires=0)
    qml.RX(x[1] * x[2], wires=0)
    return qml.probs(wires=0)

@jax.jit
def cost_fn(weights):
    x = jnp.sin(weights)
    return jnp.sum(jnp.cos(circuit(x)) ** 2)

>>> cost_fn(jnp.array([0.1, 0.2, 0.3]))
Array(1.32269195, dtype=float64)

Catalyst-compiled functions can now also be automatically differentiated via JAX, both in forward and reverse mode to first-order,

>>> jax.grad(cost_fn)(jnp.array([0.1, 0.2, 0.3]))
Array([0.49249037, 0.05197949, 0.02991883], dtype=float64)

as well as vectorized using jax.vmap:

>>> jax.vmap(cost_fn)(jnp.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]))
Array([1.32269195, 1.53905377], dtype=float64)

In particular, this allows for a reduction in boilerplate when using JAX-compatible optimizers such as jaxopt:

>>> opt = jaxopt.GradientDescent(cost_fn)
>>> params = jnp.array([0.1, 0.2, 0.3])
>>> (final_params, _) = jax.jit(opt.run)(params)
>>> final_params
Array([-0.00320799,  0.03475223,  0.29362844], dtype=float64)

Note that, in general, best performance will be seen when the Catalyst @qjit decorator is used to JIT the entire hybrid workflow. However, there may be cases where you may want to delegate only the quantum part of your workflow to Catalyst, and let JAX handle classical components (for example, due to missing a feature or compatibility issue in Catalyst).

Support for Amazon Braket devices provided via the PennyLane-Braket plugin. #118 #139 #179 #180

This enables quantum subprograms within a JIT-compiled Catalyst workflow to execute on Braket simulator and hardware devices, including remote cloud-based simulators such as SV1.
```
def circuit(x, y):
    qml.RX(y * x, wires=0)
    qml.RX(x * 2, wires=1)
    return qml.expval(qml.PauliY(0) @ qml.PauliZ(1))

@qjit
def workflow(x: float, y: float):
    device = qml.device("braket.local.qubit", backend="braket_sv", wires=2)
    g = qml.qnode(device)(circuit)
    h = catalyst.grad(g)
    return h(x, y)

workflow(1.0, 2.0)
```
For a list of available devices, please see the PennyLane-Braket documentation.

Internally, the quantum instructions are generating OpenQASM3 kernels at runtime; these are then executed on both local (braket.local.qubit) and remote (braket.aws.qubit) devices backed by Amazon Braket Python SDK,

with measurement results then propagated back to the frontend.

Note that at initial release, not all Catalyst features are supported with Braket. In particular, dynamic circuit features, such as mid-circuit measurements, will not work with Braket devices.

Catalyst conditional functions defined via @catalyst.cond now support an arbitrary number of ‘else if’ chains. #104

dev = qml.device("lightning.qubit", wires=1)

@qjit
@qml.qnode(dev)
def circuit(x):

    @catalyst.cond(x > 2.7)
    def cond_fn():
        qml.RX(x, wires=0)

    @cond_fn.else_if(x > 1.4)
    def cond_elif():
        qml.RY(x, wires=0)

    @cond_fn.otherwise
    def cond_else():
        qml.RX(x ** 2, wires=0)

    cond_fn()

    return qml.probs(wires=0)

Iterating in reverse is now supported with constant negative step sizes via catalyst.for_loop. #129

dev = qml.device("lightning.qubit", wires=1)

@qjit
@qml.qnode(dev)
def circuit(n):

    @catalyst.for_loop(n, 0, -1)
    def loop_fn(_):
        qml.PauliX(0)

    loop_fn()
    return measure(0)

Additional gradient transforms for computing the vector-Jacobian product (VJP) and Jacobian-vector product (JVP) are now available in Catalyst. #98

Use catalyst.vjp to compute the forward-pass value and VJP:

@qjit
def vjp(params, cotangent):
    def f(x):
        y = [jnp.sin(x[0]), x[1] ** 2, x[0] * x[1]]
        return jnp.stack(y)

    return catalyst.vjp(f, [params], [cotangent])

>>> x = jnp.array([0.1, 0.2])
>>> dy = jnp.array([-0.5, 0.1, 0.3])
>>> vjp(x, dy)
[array([0.09983342, 0.04      , 0.02      ]),
 array([-0.43750208,  0.07000001])]

Use catalyst.jvp to compute the forward-pass value and JVP:

@qjit
def jvp(params, tangent):
    def f(x):
        y = [jnp.sin(x[0]), x[1] ** 2, x[0] * x[1]]
        return jnp.stack(y)

    return catalyst.jvp(f, [params], [tangent])

>>> x = jnp.array([0.1, 0.2])
>>> tangent = jnp.array([0.3, 0.6])
>>> jvp(x, tangent)
[array([0.09983342, 0.04      , 0.02      ]),
 array([0.29850125, 0.24000006, 0.12      ])]

Support for multiple backend devices within a single qjit-compiled function is now available. #86 #89

For example, if you compile the Catalyst runtime with lightning.kokkos support (via the compilation flag ENABLE_LIGHTNING_KOKKOS=ON), you can use lightning.qubit and lightning.kokkos within a singular workflow:

dev1 = qml.device("lightning.qubit", wires=1)
dev2 = qml.device("lightning.kokkos", wires=1)

@qml.qnode(dev1)
def circuit1(x):
    qml.RX(jnp.pi * x[0], wires=0)
    qml.RY(x[1] ** 2, wires=0)
    qml.RX(x[1] * x[2], wires=0)
    return qml.var(qml.PauliZ(0))

@qml.qnode(dev2)
def circuit2(x):

    @catalyst.cond(x > 2.7)
    def cond_fn():
        qml.RX(x, wires=0)

    @cond_fn.otherwise
    def cond_else():
        qml.RX(x ** 2, wires=0)

    cond_fn()

    return qml.probs(wires=0)

@qjit
def cost(x):
    return circuit2(circuit1(x))

>>> x = jnp.array([0.54, 0.31])
>>> cost(x)
array([0.80842369, 0.19157631])

Support for returning the variance of Hamiltonians, Hermitian matrices, and Tensors via qml.var has been added. #124

dev = qml.device("lightning.qubit", wires=2)

@qjit
@qml.qnode(dev)
def circuit(x):
    qml.RX(jnp.pi * x[0], wires=0)
    qml.RY(x[1] ** 2, wires=1)
    qml.CNOT(wires=[0, 1])
    qml.RX(x[1] * x[2], wires=0)
    return qml.var(qml.PauliZ(0) @ qml.PauliX(1))

>>> x = jnp.array([0.54, 0.31])
>>> circuit(x)
array(0.98851544)

Breaking changes

The catalyst.grad function now supports using the differentiation method defined on the QNode (via the diff_method argument) rather than applying a global differentiation method. #163

As part of this change, the method argument now accepts the following options:
- method="auto": Quantum components of the hybrid function are differentiated according to the corresponding QNode diff_method, while the classical computation is differentiated using traditional auto-diff.
  
  With this strategy, Catalyst only currently supports QNodes with diff_method="param-shift" anddiff_method=”adjoint”`.
- method="fd": First-order finite-differences for the entire hybrid function. The diff_method argument for each QNode is ignored.
This is an intermediate step towards differentiating functions that internally call multiple QNodes, and towards supporting differentiation of classical postprocessing.

Improvements

Catalyst has been upgraded to work with JAX v0.4.13. #143 #185
Add a Backprop operation for using autodifferentiation (AD) at the LLVM level with Enzyme AD. The Backprop operations has a bufferization pattern and a lowering to LLVM. #107 #116
Error handling has been improved. The runtime now throws more descriptive and unified expressions for runtime errors and assertions. #92
In preparation for easier debugging, the compiler has been refactored to allow easy prototyping of new compilation pipelines. #38

In the future, this will allow the ability to generate MLIR or LLVM-IR by loading input from a string or file, rather than generating it from Python.

As part of this refactor, the following changes were made:
- Passes are now classes. This allows developers/users looking to change flags to inherit from these passes and change the flags.
- Passes are now passed as arguments to the compiler. Custom passes can just be passed to the compiler as an argument, as long as they implement a run method which takes an input and the output of this method can be fed to the next pass.
Improved Python compatibility by providing a stable signature for user generated functions. #106
Handle C++ exceptions without unwinding the whole stack. #99
Reduce the number of classical invocations by counting the number of gate parameters in the argmap function. #136

Prior to this, the computation of hybrid gradients executed all of the classical code being differentiated in a pcount function that solely counted the number of gate parameters in the quantum circuit. This was so argmap and other downstream functions could allocate memrefs large enough to store all gate parameters.

Now, instead of counting the number of parameters separately, a dynamically-resizable array is used in the argmap function directly to store the gate parameters. This removes one invocation of all of the classical code being differentiated.
Use Tablegen to define MLIR passes instead of C++ to reduce overhead of adding new passes. #157
Perform constant folding on wire indices for quantum.insert and quantum.extract ops, used when writing (resp. reading) qubits to (resp. from) quantum registers. #161
Represent known named observables as members of an MLIR Enum rather than a raw integer. This improves IR readability. #165

Bug fixes

Fix a bug in the mapping from logical to concrete qubits for mid-circuit measurements. #80
Fix a bug in the way gradient result type is inferred. #84
Fix a memory regression and reduce memory footprint by removing unnecessary temporary buffers. #100
Provide a new abstraction to the QuantumDevice interface in the runtime called DataView. C++ implementations of the interface can iterate through and directly store results into the DataView independent of the underlying memory layout. This can eliminate redundant buffer copies at the interface boundaries, which has been applied to existing devices. #109
Reduce memory utilization by transferring ownership of buffers from the runtime to Python instead of copying them. This includes adding a compiler pass that copies global buffers into the heap as global buffers cannot be transferred to Python. #112
Temporary fix of use-after-free and dependency of uninitialized memory. #121
Fix file renaming within pass pipelines. #126
Fix the issue with the do_queue deprecation warnings in PennyLane. #146

Fix the issue with gradients failing to work with hybrid functions that contain constant jnp.array objects. This will enable PennyLane operators that have data in the form of a jnp.array, such as a Hamiltonian, to be included in a qjit-compiled function. #152

An example of a newly supported workflow:

coeffs = jnp.array([0.1, 0.2])
terms = [qml.PauliX(0) @ qml.PauliZ(1), qml.PauliZ(0)]
H = qml.Hamiltonian(coeffs, terms)

@qjit
@qml.qnode(qml.device("lightning.qubit", wires=2))
def circuit(x):
  qml.RX(x[0], wires=0)
  qml.RY(x[1], wires=0)
  qml.CNOT(wires=[0, 1])
  return qml.expval(H)

params = jnp.array([0.3, 0.4])
jax.grad(circuit)(params)

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, David Ittah, Erick Ochoa Lopez, Jacob Mai Peng, Romain Moyard, Sergei Mironov.

Release 0.1.2¶

New features

Add an option to print verbose messages explaining the compilation process. #68
Allow catalyst.grad to be used on any traceable function (within a qjit context). This means the operation is no longer restricted to acting on qml.qnodes only. #75

Improvements

Work in progress on a Lightning-Kokkos backend:

Bring feature parity to the Lightning-Kokkos backend simulator. #55

Add support for variance measurements for all observables. #70
Build the runtime against qir-stdlib v0.1.0. #58
Replace input-checking assertions with exceptions. #67
Perform function inlining to improve optimizations and memory management within the compiler. #72

Breaking changes

Bug fixes

Several fixes to address memory leaks in the compiled program:

Fix memory leaks from data that flows back into the Python environment. #54

Fix memory leaks resulting from partial bufferization at the MLIR level. This fix makes the necessary changes to reintroduce the -buffer-deallocation pass into the MLIR pass pipeline. The pass guarantees that all allocations contained within a function (that is allocations that are not returned from a function) are also deallocated. #61

Lift heap allocations for quantum op results from the runtime into the MLIR compiler core. This allows all memref buffers to be memory managed in MLIR using the MLIR bufferization infrastructure. #63

Eliminate all memory leaks by tracking memory allocations at runtime. The memory allocations which are still alive when the compiled function terminates, will be freed in the finalization / teardown function. #78
Fix returning complex scalars from the compiled function. #77

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, David Ittah, Erick Ochoa Lopez, Sergei Mironov.

Release 0.1.1¶

New features

Adds support for interpreting control flow operations. #31

Improvements

Adds fallback compiler drivers to increase reliability during linking phase. Also adds support for a CATALYST_CC environment variable for manual specification of the compiler driver used for linking. #30

Breaking changes

Bug fixes

Fixes the Catalyst image path in the readme to properly render on PyPI.

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, Erick Ochoa Lopez.

Release 0.1.0¶

Initial public release.

Contributors

This release contains contributions from (in alphabetical order):

Ali Asadi, Sam Banning, David Ittah, Josh Izaac, Erick Ochoa Lopez, Sergei Mironov, Isidor Schoch.