Parallel for loop: Internal compiler error

Hi,

I’m trying to compile a simple loop where the iteration boundaries are not fix:

#pragma acc region
{
#pragma acc for
    for (int y = ytl; y < ybr; y++) {
        for (int x = xtl; x < xbr; x++) {
            float v1 = a[x + y*width];
            float v2 = b[x + y*width];
            sum += (v1 - v2);
        }
    }
}

However, the compiler exits with an internal compiler error:

PGC-F-0000-Internal compiler error. unknown reference      15 (test.c: 46)
PGC/x86-64 Linux 10.0-0: compilation aborted

The loop is inside of a function ending at line number 46.
When I replace xtl by a constant, the compiler generates working code for CUDA. Is there any way to get around this issue?

Best regards,
Richard

Hi Richard,

I tried to recreate the ICE, but was unable. If you could send an example code which reproduces the error to PGI customer service ([email protected]), I would appreciate it.

What I did encounter was that I needed to specify the bounds of the a and b arrays in the copyin clause. It’s possible that you need to do the same. For example:

% cat test.c
#include <stdio.h>
#include <accel.h>

float foo (float *a, float *b, int xtl, int ytl, int ybr,int xbr,int width) {
float sum;

#pragma acc region copyin(a[0:xtl], b[0:xtl])
{
    for (int y = ytl; y < ybr; y++) {
        for (int x = xtl; x < xbr; x++) {
            float v1 = a[x + y*width];
            float v2 = b[x + y*width];
            sum += (v1 - v2);
        }
    }
}
return sum;
}
% pgcc -c test.c -Msafeptr -ta=nvidia -V10.0 -Minfo=accel
foo:
      9, Generating copyin(b[:xtl])
         Generating copyin(a[:xtl])
     11, Loop is parallelizable
         Accelerator kernel generated
         11, #pragma acc for parallel, vector(256)
         15, Sum reduction generated for sum
     12, Loop is parallelizable

Hope this helps,
Mat

Hi Mat,

using the copyin clause was the right hint!
I had still some problems since I used structs to store some parameters and got the same error message. Using normal variables solved the problem!

Richard