[RFC] mlir-spirv-runner

Hi everyone,

I am working on SPIR-V to LLVM conversion at the moment and together with @antiagainst, @MaheshRavishankar we were thinking to have a “mlir-spirv-runner” tool. This runner will in some way resemble Cuda/Vulkan runners and aim at JITing SPIR-V via SPIR-V to LLVM conversion. One of the results I am particularly interested in is executing GPU/SPIR-V modules on CPU.

Encoding descriptor sets
Kernel arguments in SPIR-V are represented as global variables with set and binding numbers specified, e.g.

spv.globalVariable @__var bind(0, 1) : ...

I think this can be encoded in symbolic reference of the variable:

spv.globalVariable @__var_set0_binding1 : ...

so that we can lower spv.globalVariable to llvm.mlir.global via existing conversion pass.

Pipeline
I have 2 options how to structure the pipeline:

  • Nested modules

    The input is a module with a gpu module containing the kernel, main function and function declarations for helpers (LHS of the diagram). The outline of the passes is the following:
  1. Convert GPU dialect to SPIR-V dialect
  2. Lower ABI attributes and update VCE triple
  3. Preprocess spv.module so that it can be lowered to LLVM (called SPIRVEncodeDescriptorSets in the diagram for now and described in more detail above)
  4. Convert SPIR-V to LLVM (and drop entry points for now assuming there is no “internal” functions)
  5. Convert standard to LLVM
  6. Handle GPU launch op (ConvertGPULaunchToLLVM). For that, we can get the source pointer to the buffer data and the destination pointer of the kernel’s global variable. We would naturally want to transfer the buffer from the host to device to execute the kernel. But we are running on CPU so instead we “emulate” this memory transfer by copying the data to some destination pointer (global variable in our case), and then executing the kernel which now has its global variables with data all set up.

The problem with this approach is that the result of running the passes is a nested module that cannot be translated to proper LLVM. To take care of it, we can “embed” the kernel’s module into the main one, and resolve possible conflicts in symbolic references.

  • Separate modules

    This approach separates the host code and the device code into 2 modules in 2 files. The pipeline is similar to the one above, but in the end we compile the nested module and the main module separately into two separate object files, and then link them.

This approach has a number of drawbacks however:

  1. We would need to specify the variable/kernel declarations in the main module to tell the compiler that those exist in some other module.
  2. More importantly, there is no crossing of the boundary in modules in MLIR at the moment. Handling this is a separate case and it has to be discussed separately.

I see the second approach more natural, but given the current state of separate modules handling I think that embedding may be preferable.

It would be great to hear any other comments on this!

Thanks,
George

I don’t really understand what is the conceptual difference between mlir-spiv-runner and mlir-vulkan-runner right now?

Vulkan runner uses the Vulkan API to launch SPIR-V binary. This is using the SPIR-V converted to LLVM, The converted module is compiled and linked with the object file generated for the host side.

Oh so this is running on the cpu then? This is more like mlir-cpu-runner with a spirv lowering path? Is there a specific runtime as well or it is just the lowering pipeline that is different?

Sorry, I maybe should have been more specific on the conceptual part.

It is running on CPU, and the runtime side is the same as in mlir-cpu-runner: lowering to LLVM IR, JIT-compiling and executing. The difference is in the starting point (and hence the lowering pipeline), which is roughly the same as the gpu runners use, and the actual goal of being able to JIT SPIR-V.