does "acc loop seq" work

Hi!

I have nested loops. I marked inner loop with “!$acc loop seq”, but compiler’s output looks like compiler ignore this directive. I tried to write a small program to reproduce another bug I faced with.

       program test

       integer, parameter :: SZ = 12800000

       integer :: i,j,k
       real*8  :: d(SZ)
       real*8  :: tmp(128)

       d(:) = 0.0
!$acc kernels loop private(tmp) independent
       do i=0,((SZ/128)-1)
!$acc loop seq
          do j=1,128
             tmp(j) = 1.0/j
          enddo
!$acc loop seq
          do j=1,128
             d(i*128+j) = tmp(j)*3.1415
          enddo
       enddo
!$acc end kernels

       print *, 'sum = ', sum(d)

       end program

Output:

 pgi$ pgfortran -acc -Minfo test.f90
test:
     13, Generating present_or_copy(d(:))
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     14, Loop is parallelizable
     16, Loop is parallelizable
         Accelerator kernel generated
         14, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
         16, CC 1.3 : 15 registers; 20 shared, 48 constant, 0 local memory bytes
             CC 2.0 : 17 registers; 0 shared, 48 constant, 0 local memory bytes
     20, Loop is parallelizable
         Accelerator kernel generated
         14, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
         20, CC 1.3 : 10 registers; 24 shared, 4 constant, 0 local memory bytes
             CC 2.0 : 17 registers; 0 shared, 44 constant, 0 local memory bytes
     26, sum reduction inlined

Result is correct.

Any ideas? Should I use “acc parallel loop” instead of “kernels”?

Alexey

Hi Alexey,

What the compiler has done here is break apart your loops into two separate kernels looking something like:

!$acc kernels loop 
       do i=0,((SZ/128)-1)
!$acc loop seq
          do j=1,128
             tmp(i,j) = 1.0/j
          enddo
      end do

!$acc kernels loop 
       do i=0,((SZ/128)-1)
!$acc loop seq
          do j=1,128
             d(i*128+j) = tmp(i,j)*3.1415
          enddo
       enddo

So the “seq” is being preserved, it’s just that there are now two different kernels created from the outer loop.

Should I use “acc parallel loop” instead of “kernels”?

If you want to force the use of a single kernel, then yes, you can use “parallel” here.

% cat f10_2_12.f90 
       program test

       integer, parameter :: SZ = 12800000

       integer :: i,j,k
       real*8  :: d(SZ)
       real*8  :: tmp(128)

       d(:) = 0.0
!$acc parallel loop private(tmp) 
       do i=0,((SZ/128)-1)
!$acc loop seq
          do j=1,128
             tmp(j) = 1.0/j
          enddo
!$acc loop seq
          do j=1,128
             d(i*128+j) = tmp(j)*3.1415
          enddo
       enddo

       print *, 'sum = ', sum(d)

       end program
% pgf90 -acc -Minfo=accel f10_2_12.f90 -V12.9
test:
     10, Accelerator kernel generated
         10, CC 1.3 : 17 registers; 32 shared, 48 constant, 0 local memory bytes
             CC 2.0 : 22 registers; 0 shared, 56 constant, 0 local memory bytes
         11, !$acc loop gang, vector(256) ! blockidx%x threadidx%x
     10, Generating present_or_copy(d(:))
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     13, Loop is parallelizable
     17, Loop is parallelizable

Hope this helps,
Mat

Thank you Mat