Hi TJ,
The problem with maxloc was that it’s a function call. After reworking the code a bit, the call is now inlined into the compute region so can be used. This support was added in the 13.6 release. Here’s an example:
% cat test.f90
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program routine_test
implicit none
integer,dimension(:,:), allocatable :: var1, var2
real(8),dimension(:,:,:,:),allocatable :: testd
integer :: i, j, k, l
integer :: maxloc_test(2)
allocate(var1(32,32), var2(32,32))
allocate(testd(32,32,32,32))
call random_number(testd)
!$acc parallel loop collapse(2) private(maxloc_test)
do i = 1, 32
do j = 1, 32
maxloc_test = maxloc( testd(i,j,:,:) )
var1(i,j) = maxloc_test(1)
var2(i,j) = maxloc_test(2)
end do
end do
do i = 1, 32
do j = 1, 32
print *, i,j, testd(i,j,var1(i,j),var2(i,j))
enddo
enddo
end program routine_test
% pgf90 test.f90 -acc -Minfo=accel -V14.6
routine_test:
15, Accelerator kernel generated
16, !$acc loop gang, vector(256) collapse(2) ! blockidx%x threadidx%x
17, ! blockidx%x threadidx%x collapsed
15, Generating present_or_copyin(testd(1:32,1:32,1:32,1:32))
Generating present_or_copyout(var1(1:32,1:32))
Generating present_or_copyout(var2(1:32,1:32))
Generating Tesla code
18, Loop carried scalar dependence for 'testd$vr' at line 18
Loop carried dependence of 'maxloc_test' prevents parallelization
Loop carried backward dependence of 'maxloc_test' prevents vectorization
% a.out
... cut ...
32 18 0.9972965973050236
32 19 0.9960167244840221
32 20 0.9986352990325713
32 21 0.9986596722570766
32 22 0.9979399150052046
32 23 0.9989466392062809
32 24 0.9996643184950784
32 25 0.9990432101468514
32 26 0.9998021665421817
32 27 0.9989971046207700
32 28 0.9996316733170829
32 29 0.9984658930228676
32 30 0.9985453888560727
32 31 0.9994895911071211
32 32 0.9996992909828322