Dynamically allocating memory inside __device/global__ CUDA kernel

I have kernel that compare every line in matrix with first line. Matrix is created dynamically basis on user setting. So at first I try dynamically allocating shared variable but my GPU has Capability 1.1 so I can’t do that.

Is another way to do something like this?

As far as I know, dynamically allocating shared memory has been a feature of CUDA since 1.0.

Best,
Pablo.

Could you show me sample code? I try dynamically allocating shared memory but failed.

Dynamic shared memory allocation in CUDA is performed by defining the kernel function as:

__global__ void kernel_function(...)
{
    extern __shared__ int a[];
}

You should then pass the number of bytes of shared memory to be allocated as third argument of the kernel launch line

kernel_function<<<gridDim,blockDim,a_size>>>(...)

Dynamic shared memory allocation should not be performed like this

__shared__ int a[a_size];

if a_size is unknown at compile-time. Something like

__shared__ int a[100];

should be, instead, fine.