CUDA with Pthread

samritmaity · July 23, 2008, 9:55am

Hi all,
please see my code. I am allocating device memory in main program. Then passing the pointer [ pointer to memory on device ] to a thread. From within this thread I want to access device memory or want to do data transfer from host memory to device memory.
For the following code i am getting “invalid device pointer” error at runtime. Can anybody tell me , where am I going wrong and how to solve it…

//////////////////////////////////////////////code////////////////////////////////////////////////
//------------------------global section------------------------
float *hMatForMult;
float *dMatForMult;
int matDimension = 0;
//--------------------------------------------------------------------

void doWorkCudaMultiplication(void ptr)
{
hMatForMult = new float[matDimension * matDimension];
SAFE_CALL( cudaMemcpy( ptr , (void*)hMatForMult,matDimension * matDimension * sizeof(float), cudaMemcpyHostToDevice ));
pthread_exit(NULL);
}// end of doWorkCuda

int main(int argc, char* argv)
{
pthread_t threadPtrMult;
matDimension = 100;

//---------allocating memory in device.
cudaMalloc((void**)&dMatForMult , matDimension * matDimension * sizeof(float)) ;

int errorCudaThread = pthread_create(&threadPtrMult, NULL, doWorkCudaMultiplication,(void*) dMatForMult);

if( errorCudaThread )
{
cout<<"\n Error : return code from pthread_create() is "<<errorCudaThread;
exit( -1);
}
pthread_join(threadPtrMult,NULL);

pthread_exit(NULL);

}// end of main

//////////////////////////////////////output/////////////////////////////////
in line 80 : invalid device pointer. External Image
[ line 80 is where we are calling cudaMemcpy() ]

sagrailo · July 23, 2008, 3:13pm

I guess the problem is that you’re allocating device memory in one thread, and use it from the other thread (at least, invalid device pointer seems to be reported on Linux because of that in your code)… You’d probably like to structure your code in following way (I tried to correct your code as much as possible, but still it’s hard to do it when one does not know what is to be accomplished with the given code; also, as a side note and general advice: remember to always check return value of any function returning some kind of error indication):

#include <assert.h>

#include <stdlib.h>

#include <pthread.h>

#include <cuda_runtime.h>

#define SIZE 100

typedef struct {

    int             size_;

    float          *values_;

} Matrix;

void           *

run(void *arguments)

{

    cudaError       error;

   Matrix         *matrix = (Matrix *) arguments;

    int             size = matrix->size_;

    float          *values = matrix->values_;

   // probably set CUDA device here, as there is no reason to mess with

    // pthreads except to be able to run kernel on multiple devices...

   float          *values_d;

    error = cudaMalloc((void **) &values_d, size * size * sizeof(float));

    assert(error == cudaSuccess);

   error =

	cudaMemcpy(values_d, values, size * size * sizeof(float),

     cudaMemcpyHostToDevice);

    assert(error == 0);

   // probably launch kernel using "values_d" here...

   // probably copy results to host memory here...

   error = cudaFree(values_d);

    assert(error == 0);

   pthread_exit(NULL);

}

int

main(void)

{

    int             error;

   pthread_attr_t  attributes;

    error = pthread_attr_init(&attributes);

    assert(error == 0);

    error =

	pthread_attr_setdetachstate(&attributes, PTHREAD_CREATE_JOINABLE);

    assert(error == 0);

   Matrix         *matrix = (Matrix *) malloc(sizeof(Matrix));

    assert(matrix != NULL);

   matrix->size_ = SIZE;

    matrix->values_ = (float *) malloc(SIZE * SIZE * sizeof(float));

    assert(matrix->values_ != NULL);

   // probably initialize matrix->values_ here...

   pthread_t       thread;

    error = pthread_create(&thread, &attributes, run, matrix);

    assert(error == 0);

   pthread_attr_destroy(&attributes);

   error = pthread_join(thread, NULL);

    assert(error == 0);

   // probably collect results from all threads here...

   free(matrix->values_);

    free(matrix);

   pthread_exit(NULL);

}

tmurray · July 23, 2008, 3:35pm

You’re passing around device contexts between threads–you need to use the thread migration API introduced in 2.0b.

samritmaity · July 24, 2008, 4:20am

Thanks for your reply. I will modify my code the way you suggested.

samritmaity · July 24, 2008, 6:21am

Dear tmurray,
Thank for your reply. I got some indication regarding error in my code. I would like to know , which specific APIs , I can use to solve this problem. External Image

Not able to find Thread Migration API in 2.0 reference manual. :sad:
sam

E.D_Riedijk · July 24, 2008, 9:13am

there is an example in the SDK

Topic		Replies	Views
Invalid Device Pointer CUDA Programming and Performance	3	1269	June 5, 2009
Pinned memory error invalid device pointer CUDA Programming and Performance	9	6086	April 10, 2009
seems that cuda doesn't support pointer to pointer problem report CUDA Programming and Performance	11	11713	March 29, 2012
Multithreading and CUDA CUDA Programming and Performance	6	9085	April 14, 2010
Pointer arithmetic in device memory CUDA Programming and Performance	3	9198	August 22, 2007
2d matrix passing values help with this code CUDA Programming and Performance	4	3209	November 10, 2010
cudaMemcpy error invalid device pointer CUDA Programming and Performance	2	4899	January 23, 2009
Invalid Device Pointer CUDA Programming and Performance	9	24504	January 15, 2009
cudaMalloc and threads "invalid device pointer" error CUDA Programming and Performance	4	5450	June 26, 2007
C++ DLL - Invalid Device Pointer CUDA Programming and Performance	4	3486	August 12, 2009

CUDA with Pthread

Related topics