maxloc with openacc

Hi,

I know there’s some issue on maxloc/minloc with openacc, but it hasn’t been updated since the last post which is more than a year ago. I am trying to translate the following simple code w/ openmp to that w/ openacc:

!$omp do parallel private(maxloc_test)
do i = 1, 100
do j = 1, 100
    maxloc_test = maxloc( testd(i,j,:,:) )
    var1(i,j) = maxloc_test(1)
    var2(i,j) = maxloc_test(2)
end do 
end do
!$omp end do parallel

Could anyone tell me whether maxloc is being supported and how to deal with the private clause with openacc given that it is nested withing the data clause?

Thanks for your time and effort to read this.
Best,
TJ

Hi TJ,

The problem with maxloc was that it’s a function call. After reworking the code a bit, the call is now inlined into the compute region so can be used. This support was added in the 13.6 release. Here’s an example:

% cat test.f90
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program routine_test
  implicit none

  integer,dimension(:,:), allocatable :: var1, var2
  real(8),dimension(:,:,:,:),allocatable :: testd
  integer               :: i, j, k, l
  integer :: maxloc_test(2)

  allocate(var1(32,32), var2(32,32))
  allocate(testd(32,32,32,32))
  call random_number(testd)


!$acc parallel loop collapse(2) private(maxloc_test)
 do i = 1, 32
 do j = 1, 32
     maxloc_test = maxloc( testd(i,j,:,:) )
     var1(i,j) = maxloc_test(1)
     var2(i,j) = maxloc_test(2)
 end do
 end do
 do i = 1, 32
 do j = 1, 32
   print *, i,j, testd(i,j,var1(i,j),var2(i,j))
 enddo
 enddo

end program routine_test

% pgf90 test.f90 -acc -Minfo=accel -V14.6
routine_test:
     15, Accelerator kernel generated
         16, !$acc loop gang, vector(256) collapse(2) ! blockidx%x threadidx%x
         17,   ! blockidx%x threadidx%x collapsed
     15, Generating present_or_copyin(testd(1:32,1:32,1:32,1:32))
         Generating present_or_copyout(var1(1:32,1:32))
         Generating present_or_copyout(var2(1:32,1:32))
         Generating Tesla code
     18, Loop carried scalar dependence for 'testd$vr' at line 18
         Loop carried dependence of 'maxloc_test' prevents parallelization
         Loop carried backward dependence of 'maxloc_test' prevents vectorization
% a.out
... cut ...
           32           18   0.9972965973050236
           32           19   0.9960167244840221
           32           20   0.9986352990325713
           32           21   0.9986596722570766
           32           22   0.9979399150052046
           32           23   0.9989466392062809
           32           24   0.9996643184950784
           32           25   0.9990432101468514
           32           26   0.9998021665421817
           32           27   0.9989971046207700
           32           28   0.9996316733170829
           32           29   0.9984658930228676
           32           30   0.9985453888560727
           32           31   0.9994895911071211
           32           32   0.9996992909828322

Thanks for your answer :)

The other thing is that I’d like the loop to be nested within the data clause as below, but it’s not allowed because maxloc_test is already included in the data caluse: " A data clause for a variable appears within another region with a data clause for the same variable policy_temp"

!$acc data copyout(var1, var2) &
!$acc           copyin(testd)        &
!$acc           create(maxloc_test)

!$acc parallel loop collapse(2) private(maxloc_test) 
 do i = 1, 32 
 do j = 1, 32 
     maxloc_test = maxloc( testd(i,j,:,:) ) 
     var1(i,j) = maxloc_test(1) 
     var2(i,j) = maxloc_test(2) 
 end do 
 end do

What should fix this?

What are you trying to do by putting “maxloc_test” into the data region?

When you use “private”, every thread will get it’s own copy of “maxloc_test”. Putting in it a data region, you’re saying that all threads will share the same array.

Thank you so much, again.

I think that my understanding on data regions that all arrays used in the nested compute regions have to be declared on date regions(copy, copyin/out, or create) must be false.

But when I only use “private”, I end up with the error message:
call to cuMemFree returned error 1 : invalid value

I’m not sure the reason.

Best,
TJ

Hi TJ,

But when I only use “private”, I end up with the error message:
call to cuMemFree returned error 1 : invalid value

Can you post a reproducing example?

  • Mat