i am still newbie to CUDA.
i have a project where i have an array A with length N. and array B
i want to divide the array A into segments each segment is assigned to a specific block threads.
e.x.
A[12]={1,4,5,3,6,3,2,5,2,1,1,2};
B[3,6,11];
i want to launch three blocks first block calculate the sum of elements form 0-3, second block calculate the sum of elements form 4-6, and third block calculate the sum of elements form 7-11.
i am not asking for code. i am asking for algorithm
Launch 3 blocks. In each block do a block-level reduction on the appropriate data set. Use the blockIdx.x built-in variable in your CUDA kernel to select the appropriate element of an array that defines the data boundaries.
You can write your own block-level reduction, but the CUDA reduction sample code is a good thing to review: