The code below attempts to access heap-allocated memory via a std::shared_ptr, within the body of the value-capturing lambda function given to std::transform.
#include <memory>
#include <execution>
#include <algorithm>
int main(int argc, char *argv[])
{
using namespace std;
const unsigned sz{1024};
shared_ptr<int> nine{new int{9}};
vector<unsigned> v(sz);
const auto pol = execution::par_unseq;
transform(pol, v.begin(), v.end(), v.begin(), [=](auto) { return *nine; });
return 0;
}
Compiling with nvc++ fails with the following message:
nvlink error : Undefined reference to '__hxdGetDeviceFunc' in '/tmp/nvc++pDvdTPsfRv7O.o'
pgacclnk: child process exit status 2: /opt/nvidia/hpc_sdk_multi/Linux_x86_64/20.11/compilers/bin/tools/nvdd
The compile command is nvc++ -stdpar -std=c++17 prog.cpp. I’m using the HPC SDK 20.11 and CUDA 11.0 under 64-bit Ubuntu 20.10. I think it’s a bug, but as we plan to use std::shared_ptr in this way, I wanted to double-check.
More info on this. std::shared_ptr uses virtual functions in its implementation. Virtual functions are implemented with pointers to functions, which nvc++ -stdpar does not currently support in device code. That unfortunately makes std::shared_ptr unusable in device code.
In the near term, we’ll work on improving the error message so it’s more evident as to the issue. Long term, we are working on support for function pointers and virtual functions, but this requires some support in the CUDA driver so will be awhile.