Yep, that one. We’ll still discuss this topic though
For now, I think the next step is a “fresh” PR with a standalone GPU backend, or at least the first parts of it. Tests, etc. so we can have a proper review. While lambdas and objects with call operator already work, in the sense that we auto-magically compile them for the GPU target together with their visible transitive dependences, we need to look at syntax/APIs for users that “dislike OpenMP”.
It should not be much of a problem, we simply hide the pragmas and such behind agnostic llvm offload APIs. Note that the below is only needed for complex cases and if you want to improve performance by reducing memory movement on “non-unified shared memory systems”.
To move an array to the GPU and keep it there to avoid moves when std::par algorithms are used in sequence:
#pragma omp target data enter map(tofrom:Array[0:N]) device(DevNo)
can be hidden in a function like
void llvm_offload_host_to_device(void * ptr, size_t bytes, int device_no);
The runtime will notice the mapping, not move the array, and use the device address of it automatically in the GPU code.
Similarly, to compile a function for the GPU explicitly, e.g., if it is passed via a function pointer or in a different translation unit:
#pragma omp declare target(fn_foo) [indirect]
could be replaced by a attribute. We have __device__
, we have [[omp::declare_target]]
and we could have something for the llvm offload API, all do effectively the same thing.
(Please ignore syntax errors, the idea should be sound.)
Wrt. tests:
We have ~4 GPU buildbots right now, I’ll check with the owners if they build libc++ and if we can test the GPU backend. We are also in the process of setting up CI capabilities on a multi GPU system, but that might need some more time.
If the libcxx buildbots could be migrated to a GPU system, or if we want to test them via CPU offfload, that would be great too. All of the required code, runtimes, etc. are in upstream LLVM, so the outside dependence’s are only the GPU “drivers” (rocm for AMD and CUDA for NVIDIA).