Using multi devices sumultanous

Boubou63 · June 16, 2011, 2:00pm

Hello!
I’m a newbie in Cuda. How to use many devices in the same time in a cuda code? I use the latest version cuda 4.0.
Thanks for your attention.

Boubou63 · June 16, 2011, 2:00pm

Hello!
I’m a newbie in Cuda. How to use many devices in the same time in a cuda code? I use the latest version cuda 4.0.
Thanks for your attention.

brano · June 16, 2011, 2:08pm

Hi,

One option is to use on thread per GPU.

brano · June 16, 2011, 2:08pm

Hi,

One option is to use on thread per GPU.

Boubou63 · June 16, 2011, 2:28pm

Excuse me, it isn’t clear for me. Do you have an example? Suppose that i want use ex1_kernel on 2 GPUs.
I do this but it seems that the execution is sequential and not sumultaneous:

dim3 dimBlock(BlockSize);
dim3 dimGrid((n+BlockSize-1)/BlockSize);
dim3 dimGrida((nnz+BlockSize-1)/BlockSize);
dim3 dimGridTest(65535);

cudaGetDevice(0);
ex1_kernel<<<dimGridTest, dimBlock >>>(…variables allocated on device(0)…);
cutilSafeCall(cudaMemcpy(…cudaMemcpyDeviceToHost));

cudaGetDevice(1);
ex1_kernel<<<dimGridTest, dimBlock >>>(…variables allocated on device(1)…);
cutilSafeCall(cudaMemcpy(…cudaMemcpyDeviceToHost));

Thank you.

Boubou63 · June 16, 2011, 2:28pm

Excuse me, it isn’t clear for me. Do you have an example? Suppose that i want use ex1_kernel on 2 GPUs.
I do this but it seems that the execution is sequential and not sumultaneous:

dim3 dimBlock(BlockSize);
dim3 dimGrid((n+BlockSize-1)/BlockSize);
dim3 dimGrida((nnz+BlockSize-1)/BlockSize);
dim3 dimGridTest(65535);

cudaGetDevice(0);
ex1_kernel<<<dimGridTest, dimBlock >>>(…variables allocated on device(0)…);
cutilSafeCall(cudaMemcpy(…cudaMemcpyDeviceToHost));

cudaGetDevice(1);
ex1_kernel<<<dimGridTest, dimBlock >>>(…variables allocated on device(1)…);
cutilSafeCall(cudaMemcpy(…cudaMemcpyDeviceToHost));

Thank you.

brano · June 16, 2011, 2:34pm

Hi,

when i say thread i mean host thread on the CPU side. Each thread handles it’s own GPU.

You need to find a thread library that you want to use.

In CUDA 3.2 there is an SDK example “Simple multi-GPU” that could be helpful. Maybe CUDA 4.0 SDK have the same.

In CUDA 4.0 i know that it is possible to switch context in the same thread, but i can’t give you any help in that case. I have not tried 4.0 yet.

brano · June 16, 2011, 2:34pm

Hi,