AVX2/512 kernel build system

In Lightning Qubit, a kernel is registered to Pennylane::LightningQubit::DynamicDispatcher when the library is loaded, and it is used at runtime when it is the most suitable kernel for the given input.

To support AVX2 and AVX512 kernels, we always compile those kernels if the target system is UNIX on x86-64. Specifically, we made separate C++ files for AVX2 and AVX512 kernels and build them as a static library with the corresponding compile options. This is handled by CMake. One can check gates/CMakeLists.txt file for details.

One caveat is that we want to make sure that default kernels (KernelType::PI and KernelType::LM) are only instantiated once with specific compiler flags during the compile process. This is important as the linker sometimes cannot choose the right instantiation when there are multiple instantiations of the same template class. This problem does not arise when all instantiations are compiled with the same options, but with the AVX2/512 kernels, we use different compile options for each translation unit. We solve this problem by adding explicit instantiation declarations in the header files for these kernels (File GateImplementationsLM.hpp and File GateImplementationsPI.hpp) and compile them as a separate static library.

With this, the AVX2/512 kernels are always included in the binary when compiled for UNIX-compatible OSs on x86-64 architecture. However, we register these kernels to Pennylane::LightningQubit::DynamicDispatcher only when the runtime environment supports these architecture sets.

int registerAllAvailableKernels_Float() {
    using Pennylane::Util::RuntimeInfo;
    registerKernel<float, float, Gates::GateImplementationsLM>();
    registerKernel<float, float, Gates::GateImplementationsPI>();

    if (RuntimeInfo::AVX2() && RuntimeInfo::FMA()) {
        registerKernelsAVX2_Float();
    }
    if (RuntimeInfo::AVX512F()) {
        registerKernelsAVX512_Float();
    }
    return 1;
}

int registerAllAvailableKernels_Double() {
    using Pennylane::Util::RuntimeInfo;
    registerKernel<double, double, Gates::GateImplementationsLM>();
    registerKernel<double, double, Gates::GateImplementationsPI>();

    if (RuntimeInfo::AVX2() && RuntimeInfo::FMA()) {
        registerKernelsAVX2_Double();
    }
    if (RuntimeInfo::AVX512F()) {
        registerKernelsAVX512_Double();
    }
    return 1;
}

Likewise, we also inform Pennylane::KernelMap::OperationKernelMap to use AVX2/512 kernels when aligned memory is used.

int assignKernelsForGateOp() {
    assignKernelsForGateOp_Default();

    if (RuntimeInfo::AVX2() && RuntimeInfo::FMA()) {
        assignKernelsForGateOp_AVX2(CPUMemoryModel::Aligned256);
        // LCOV_EXCL_START
        if (!RuntimeInfo::AVX512F()) {
            assignKernelsForGateOp_AVX2(CPUMemoryModel::Aligned512);
        }
        // LCOV_EXCL_STOP
    }
    // LCOV_EXCL_START
    if (RuntimeInfo::AVX512F()) {
        assignKernelsForGateOp_AVX512(CPUMemoryModel::Aligned512);
    }
    // LCOV_EXCL_STOP
    return 1;
}