Hi I am trying to copy data from host to device, preform math on it then copy it back from device to host.
Here is an extremely stripped down version of my host code:
ushort cpuInputReferenceArray [5];
ushort *cpuInputPtr;
cpuInputPtr = cpuInputReferenceArray;
ushort cpuOutputReferenceArray [5];
ushort *cpuOutputPtr;
cpuOutputPtr = cpuOutputReferenceArray;
for(int i = 0; i < 5; i++)
{
cpuInputPtr[i] = i;
}
checkCudaErrors(CopyToFunction(cpuInputPtr));
for(int i = 0; i < 5; i++)
{
cpuOutputPtr[i] = 0;
}
checkCudaErrors(CopyFromFunction(cpuOutputPtr));
And here is my ,cu file code:
__device__ ushort *d_Ptr;
extern "C"
cudaError_t CopyToFunction(ushort* h_Ptr)
{
cudaError_t error;
error = cudaMalloc((void**)&d_Ptr, 5 * sizeof(ushort));
error = cudaMemcpyToSymbol(d_Ptr, &h_Ptr, 5 * sizeof(ushort));
return error;
}
extern "C"
cudaError_t CopyFromFunction(ushort* h_Ptr)
{
cudaError_t error;
error = cudaMemcpyFromSymbol(&h_Ptr, d_Ptr, 5 * sizeof(ushort));
return error;
}
I know I am doing something wrong, and I assume it has something to do with how I am using device and the scope of my different pointer.
Help in about this would be greatly appreciated.
cudaMalloc cannot work on a device variable. Make your d_Ptr a host variable. When you do that, you won’t use the cudaMemcpy…Symbol operations anymore. Just use ordinary cudaMemcpy.
also, h_Ptr is already a host pointer, so you don’t need to take the address of it when using it in a cudaMemcpy-type operation.
Hi thank you for the response, that does solve the problem.
However I was looking to keep the device tag on my variable, if it is possible.
You can create a device variable like this:
device ushort d_data[5];
and do your cudaMempcyToSymbol directly to it, without the cudaMalloc operation.
If you want something like this:
device ushort *d_Ptr;
as a pointer to a dynamic allocation, it can be done, but it’s more involved. (I would question why you would want to jump through these hoops.) Basically something like this:
device ushort *d_Ptr;
…
ushort *h_Ptr, *d_Tmp_Ptr;
cudaMalloc(&d_Tmp_Ptr, 5 *sizeof(ushort));
cudaMemcpyToSymbol(d_Ptr, &d_Tmp_Ptr, sizeof (ushort *));
…
cudaMemcpy(d_Tmp_Ptr, h_Ptr, 5*sizeof(ushort), cudaMemcpyHostToDevice);
// now d_Ptr can be used in device code, and it will point to the device allocation for 5 ushorts
takeaways:
- You cannot do cudaMalloc on a pointer that is resident on the device.
- cudaMemcpy…Symbol operations use the symbol as the source or destination of the copy. They cannot use the symbol as a pointer to somewhere else in device memory to copy there directly.
Hi, Thank you very much, that solves my problem and I appreciate it greatly.
Hi, Thank you very much, that solves my problem and I appreciate it greatly.