Hi,
I’m getting quite confused with a code that I’m trying to parallelize with OpenACC. For simplicity, I got this toy example:
module test_m
implicit none
contains
function dou(a)
!$acc routine seq
double precision :: dou,a
dou=a*2.0
end function dou
end module test_m
program test
use test_m
implicit none
double precision, dimension(1000,1000) :: a,b
integer :: i,j
a=0.0 ; b=0.0
!$acc data copy(a,b)
!$acc parallel loop collapse(2) default(none) private(i,j)
do i=1,1000
do j=1,1000
a(i,j) = (i*1000.0)*j
b(i,j) = (i*1000.0)*j
a(i,j) = 2.0*a(i,j)
b(i,j) = dou(b(i,j))
end do
end do
!$acc end parallel loop
!$acc end data
print*, sum(a), sum(b)
end program test
First, when I compile it, I get the message that the loop could not be parallelized because it contains a call, which looks strange, since I already included the !$acc routine seq in the only function call.
[test]$ pgf90 --version
pgf90 18.10-1 64-bit target on x86-64 Linux -tp haswell
[test]$ pgf90 -acc -Minfo -O3 test.f90 -o test
dou:
5, Generating acc routine seq
Generating Tesla code
test:
21, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
23, Generating copy(b(:,:),a(:,:))
24, Accelerator kernel generated
Generating Tesla code
25, !$acc loop gang, vector(128) collapse(2) ! blockidx%x threadidx%x
26, ! blockidx%x threadidx%x collapsed
26, Loop not vectorized/parallelized: contains call
36, sum reduction inlined
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
Generated vector simd code for the loop containing reductions
Generated a prefetch instruction for the loop
[test]$
More worrying, when I run it as shown, then the output is as for the serial version:
[test]$ ./test
501000499999440.0 501000499999440.0
But if I just reverse the order of the assignments in the main code, then I get incorrect results:
[test]$ diff -C 2 test2.f90 test.f90
*** test2.f90 Mon Apr 29 10:05:51 2019
--- test.f90 Mon Apr 29 10:05:42 2019
***************
*** 28,33 ****
b(i,j) = (i*1000.0)*j
- b(i,j) = dou(b(i,j))
a(i,j) = 2.0*a(i,j)
end do
end do
--- 28,33 ----
b(i,j) = (i*1000.0)*j
a(i,j) = 2.0*a(i,j)
+ b(i,j) = dou(b(i,j))
end do
end do
[test]$ pgf90 -acc -O3 test2.f90 -o test2
[test]$ ./test2
501000499999440.0 0.000000000000000
This looks like a textbook example, so I don’t understand how I can get this strange behaviour…
Ángel de Vicente