Does CUDA allow, and is it normal practice, to run distinct kernels concurrently on separate streams on a single GPU?
The Stream Managment section 4.5.2.4 of NVIDIA CUDA Programming Guide Version 2.1, shows an example similar to:
kernel<<<100, 512, 0, stream[0]>>>(out + isize, in + isize);
kernel<<<100, 512, 0, stream[1]>>>(out + isize, in + isize);
with identical kernels. But is it reasonable to do:
kernel_1<<<100, 512, 0, stream[0]>>>(out + isize, in + isize);
kernel_2<<<100, 512, 0, stream[1]>>>(out + isize, in + isize);
where kernel_1 and kernel_2 are different?
Thanks.