Bug with !$acc routine seq?

Angel_de_Vicente · April 29, 2019, 9:14am

Hi,

I’m getting quite confused with a code that I’m trying to parallelize with OpenACC. For simplicity, I got this toy example:

module test_m
  implicit none
contains
  
  function dou(a)
    !$acc routine seq                                                                                                                                                                                                
    double precision :: dou,a

    dou=a*2.0
  end function dou
end module test_m


program test
  use test_m
  implicit none
  
  double precision, dimension(1000,1000) :: a,b
  integer :: i,j
  
  a=0.0 ; b=0.0
   
  !$acc data copy(a,b)                                                                                                                                                                                               
  !$acc parallel loop collapse(2) default(none) private(i,j)                                                                                                                                                         
  do i=1,1000
     do j=1,1000
        a(i,j) = (i*1000.0)*j
        b(i,j) = (i*1000.0)*j

        a(i,j) = 2.0*a(i,j)
        b(i,j) = dou(b(i,j))
     end do
  end do
  !$acc end parallel loop
  !$acc end data                                                                                                                                                                                                     
  
  print*, sum(a), sum(b)
end program test

First, when I compile it, I get the message that the loop could not be parallelized because it contains a call, which looks strange, since I already included the !$acc routine seq in the only function call.

[test]$ pgf90 --version
pgf90 18.10-1 64-bit target on x86-64 Linux -tp haswell

[test]$ pgf90 -acc -Minfo  -O3 test.f90 -o test
dou:
      5, Generating acc routine seq
         Generating Tesla code
test:
     21, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
     23, Generating copy(b(:,:),a(:,:))
     24, Accelerator kernel generated
         Generating Tesla code
         25, !$acc loop gang, vector(128) collapse(2) ! blockidx%x threadidx%x
         26,   ! blockidx%x threadidx%x collapsed
     26, Loop not vectorized/parallelized: contains call
     36, sum reduction inlined
         Generated vector simd code for the loop containing reductions
         Generated a prefetch instruction for the loop
         Generated vector simd code for the loop containing reductions
         Generated a prefetch instruction for the loop
[test]$

More worrying, when I run it as shown, then the output is as for the serial version:

[test]$ ./test
    501000499999440.0         501000499999440.0

But if I just reverse the order of the assignments in the main code, then I get incorrect results:

[test]$ diff -C 2 test2.f90 test.f90
*** test2.f90   Mon Apr 29 10:05:51 2019
--- test.f90    Mon Apr 29 10:05:42 2019
***************
*** 28,33 ****
          b(i,j) = (i*1000.0)*j

-         b(i,j) = dou(b(i,j))
          a(i,j) = 2.0*a(i,j)
       end do
    end do
--- 28,33 ----
          b(i,j) = (i*1000.0)*j

          a(i,j) = 2.0*a(i,j)
+         b(i,j) = dou(b(i,j))
       end do
    end do
[test]$ pgf90 -acc -O3 test2.f90 -o test2
[test]$ ./test2
    501000499999440.0         0.000000000000000

This looks like a textbook example, so I don’t understand how I can get this strange behaviour…

Ángel de Vicente

MatColgrove · April 29, 2019, 3:14pm

Hi Angel,

26, Loop not vectorized/parallelized: contains call

This is coming from the host size where the compiler can vectorize or auto-parallelize the loop due to the call. It’s extraneous to the device code generation. To only see the OpenACC compiler feedback messages, use the option “-Minfo=accel”.

then I get incorrect results:

This does look like a compiler issue where it’s incorrectly optimizing away some of the offsets into the fixed size array. Hence I added a problem report (TPR #27100) and sent it to our compiler engineers for further investigation.

The error seems to occur only with our older non-LLVM based compilers hence you can work around the problem by using LLVM. We made LLVM the default in 19.1 but in 18.10 and earlier, add the the flag “-Mllvm” to your compile, or set your PATH to “$PGI/linux86-64-llvm/18.10/bin” to get LLVM.

Also, the error seems to only occur when using fixed size arrays. Hence another workaround is to make “a” and “b” allocatable arrays.

Thanks for the report!
Mat

Angel_de_Vicente · April 29, 2019, 3:47pm

Hello,

ok, I will use for the time being -Minfo=accel then.

Also, I can confirm that with 18.7-0 without -Mllvm the incorrect output was still there, but with -Mllvm, all seems fine.

Many thanks,
AdV

Topic		Replies	Views
OpenACC routine call inside OpenMP parallel loop Legacy PGI Compilers	7	1158	October 12, 2021
OpenACC unsupported operation: ARDF Legacy PGI Compilers	1	1222	September 15, 2018
How to not parallelize inner loops in OpenACC ? Legacy PGI Compilers	7	3722	May 1, 2020
OpenACC 2.0 standard and nested loops Legacy PGI Compilers	6	10416	May 2, 2014
does "acc loop seq" work Legacy PGI Compilers	2	3963	October 3, 2012
maxloc with openacc Legacy PGI Compilers	5	6895	June 23, 2014
PGI and OpenACC - problem with collapse clause Legacy PGI Compilers	4	6746	May 21, 2014
Loop inside of acc routine seq leads to incorrect results Legacy PGI Compilers	3	3578	September 13, 2015
Execute loop in routine sequentially Legacy PGI Compilers	4	5251	July 25, 2014
compiler ask acc routine information for internal function Legacy PGI Compilers	12	20316	October 25, 2017

Bug with !$acc routine seq?

Related topics