AVX2/512 kernel build system¶
In Lightning Qubit, a kernel is registered to Pennylane::LightningQubit::DynamicDispatcher
when the library is loaded, and it is used at runtime when it is the most suitable kernel for the given input.
To support AVX2 and AVX512 kernels, we always compile those kernels if the target system is UNIX on x86-64.
Specifically, we made separate C++ files for AVX2 and AVX512 kernels and build them as a static library with the corresponding compile options. This is handled by CMake. One can check gates/CMakeLists.txt
file for details.
One caveat is that we want to ensure that default KernelType::LM
kernels are only instantiated once with specific compiler flags during the compile process.
This is important as the linker sometimes cannot choose the right instantiation when there are multiple instantiations of the same template class.
This problem does not arise when all instantiations are compiled with the same options, but with the AVX2/512 kernels, we use different compile options for each translation unit. We solve this problem by adding explicit instantiation declarations in the header files for these kernels
(File GateImplementationsLM.hpp)
and compile them as a separate static library.
With this, the AVX2/512 kernels are always included in the binary when compiled for UNIX-compatible OSs on x86-64 architecture.
However, we register these kernels to Pennylane::LightningQubit::DynamicDispatcher
only when the runtime environment supports these architecture sets.
using Pennylane::Util::RuntimeInfo;
registerKernel<float, float, Gates::GateImplementationsLM>();
if (RuntimeInfo::AVX2() && RuntimeInfo::FMA()) {
registerKernelsAVX2_Float();
}
if (RuntimeInfo::AVX512F()) {
registerKernelsAVX512_Float();
}
return 1;
}
int registerAllAvailableKernels_Double() {
using Pennylane::Util::RuntimeInfo;
registerKernel<double, double, Gates::GateImplementationsLM>();
if (RuntimeInfo::AVX2() && RuntimeInfo::FMA()) {
registerKernelsAVX2_Double();
}
if (RuntimeInfo::AVX512F()) {
registerKernelsAVX512_Double();
}
return 1;
}
} // namespace Pennylane::LightningQubit::Internal
Likewise, we also inform Pennylane::KernelMap::OperationKernelMap
to use AVX2/512 kernels when aligned memory is used.
int assignKernelsForGateOp() {
assignKernelsForGateOp_Default();
if (RuntimeInfo::AVX2() && RuntimeInfo::FMA()) {
assignKernelsForGateOp_AVX2(CPUMemoryModel::Aligned256);
// LCOV_EXCL_START
if (!RuntimeInfo::AVX512F()) {
assignKernelsForGateOp_AVX2(CPUMemoryModel::Aligned512);
}
// LCOV_EXCL_STOP
}
// LCOV_EXCL_START
if (RuntimeInfo::AVX512F()) {
assignKernelsForGateOp_AVX512(CPUMemoryModel::Aligned512);
}
// LCOV_EXCL_STOP
return 1;
}