When compiling the following code (using PGI 15.1 compiler)
#ifdef _OPENACC
#pragma acc kernels
#pragma acc loop independent
#endif
for( int i( 0 ); i < 100; ++i )
{
#ifdef _OPENACC
#pragma acc loop independent
#endif
for( int j( 0 ); j < 100; ++j )
{
const float v[3] = { 1., 1., 1. };
}
}
I get the following message from the compiler
pgc++ -fast -Minfo=all -acc -ta:nvidia array.cc
“array.cc”, line 21: warning: variable “v” was declared but never referenced
const float v[3] = { 1., 1., 1. };
^main:
7, Generating copyout(v[:])
Generating Tesla code
14, Loop is parallelizable
19, Loop is parallelizable
Accelerator kernel generated
14, #pragma acc loop gang /* blockIdx.y /
19, #pragma acc loop gang, vector(128) / blockIdx.x threadIdx.x */
Why does the compiler issue a copyout statement for a loop local variable?
When leaving out the independent clause from the inner loop, the compiler gives the following messages
main:
7, Generating copyout(v[:])
Generating Tesla code
14, Loop is parallelizable
19, Loop carried reuse of v prevents parallelization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
14, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
Why does this happen? Doesn’t the compiler generate a copy of v for each thread?
If instead of an array a scalar type is used, the problem doesn’t occur. Why?
Thank you.
L