Hi,
I run into a slightly exotic problem when trying to use a device constant variable declared in my main program from a kernel compiled and loaded separately as a shared object file. The kernel loaded and launched via the .so file generally works well, but it doesn’t see the changes made to the constant memory made by cudaMemcpyToSymbol in the main program.
More details:
— main.cc --------------------
-
device constant float global_constants[4];
-
float constants = {8.0};
cudaMemcpyToSymbol(global_constants, constants, 1 * sizeof(float), 0);" );
-
load kernel.so containing CUDA kernel + extern “C” launch function using dlopen()
---- kernel.cu ----------------
The code that goes into kernel.so
[codebox]
device constant float global_constants[4];
global void test_kernel(float* g_idata, float* g_odata)
{
const unsigned int tid = threadIdx.x + blockDim.x * blockIdx.x;
g_odata[tid] = d_iidata[tid] + global_constants[0];
}
extern “C” int launch_kernel(float *d_idata, float *d_odata, unsigned num_elements)
{
unsigned num_threads = 128;
dim3 grid( (num_elements + num_threads - 1) / num_threads, 1, 1);
dim3 threads( num_threads, 1, 1);
test_kernel<<< grid, threads >>>( d_idata, d_odata);
return 1;
}
[/codebox]
Like I said, the kernel generally works fine, except for the problem with accessing global constant memory. Even though the kernel should read 8.0 from global_constants[0], it reads 0.0. I’ve tried putting an “extern” keyword in front of the global_constants declaration in kernel.cu, but that doesn’t help.
Doing an “nm kernel.so | grep global” shows:
0000000000000010 b __shadow_global_constants
The same symbol shows up in main.o.
Usually, for normal variables in shared objects, I think the global_constants variable in kernel.so (at least when defined as extern) would be left undefined, but would bind to the global_constant variable defined in the main program during dynamic linking by dlopen()… At least that’s how my pure c test program behaves.
So, somehow nvcc seems to forget to declare (_shadow)global_constants as extern when generating kernel.so.
If anyone have some idea how to solve this already now, I’d be grateful.
PS, if I do a
cudaMemcpyToSymbol(global_constants, constants, 1 * sizeof(float), 0);" );
directly in launch_kernel() in kernel.so, everything works as expected, but I’d like to avoid that.
I’m using CUDA 2.1 on Ubuntu 8.10.
/Lars