diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md b/flang/docs/DoConcurrentConversionToOpenMP.md index 62bc3172f8e3b..7b49af742f242 100644 --- a/flang/docs/DoConcurrentConversionToOpenMP.md +++ b/flang/docs/DoConcurrentConversionToOpenMP.md @@ -53,6 +53,79 @@ that: * It has been tested in a very limited way so far. * It has been tested mostly on simple synthetic inputs. +### Loop nest detection + +On the `FIR` dialect level, the following loop: +```fortran + do concurrent(i=1:n, j=1:m, k=1:o) + a(i,j,k) = i + j + k + end do +``` +is modelled as a nest of `fir.do_loop` ops such that an outer loop's region +contains **only** the following: + 1. The operations needed to assign/update the outer loop's induction variable. + 1. The inner loop itself. + +So the MLIR structure for the above example looks similar to the following: +``` + fir.do_loop %i_idx = %34 to %36 step %c1 unordered { + %i_idx_2 = fir.convert %i_idx : (index) -> i32 + fir.store %i_idx_2 to %i_iv#1 : !fir.ref + + fir.do_loop %j_idx = %37 to %39 step %c1_3 unordered { + %j_idx_2 = fir.convert %j_idx : (index) -> i32 + fir.store %j_idx_2 to %j_iv#1 : !fir.ref + + fir.do_loop %k_idx = %40 to %42 step %c1_5 unordered { + %k_idx_2 = fir.convert %k_idx : (index) -> i32 + fir.store %k_idx_2 to %k_iv#1 : !fir.ref + + ... loop nest body goes here ... + } + } + } +``` +This applies to multi-range loops in general; they are represented in the IR as +a nest of `fir.do_loop` ops with the above nesting structure. + +Therefore, the pass detects such "perfectly" nested loop ops to identify multi-range +loops and map them as "collapsed" loops in OpenMP. + +#### Further info regarding loop nest detection + +Loop nest detection is currently limited to the scenario described in the previous +section. However, this is quite limited and can be extended in the future to cover +more cases. At the moment, for the following loop nest, even though both loops are +perfectly nested, only the outer loop is parallelized: +```fortran +do concurrent(i=1:n) + do concurrent(j=1:m) + a(i,j) = i * j + end do +end do +``` + +Similarly, for the following loop nest, even though the intervening statement `x = 41` +does not have any memory effects that would affect parallelization, this nest is +not parallelized either (only the outer loop is). + +```fortran +do concurrent(i=1:n) + x = 41 + do concurrent(j=1:m) + a(i,j) = i * j + end do +end do +``` + +The above also has the consequence that the `j` variable will **not** be +privatized in the OpenMP parallel/target region. In other words, it will be +treated as if it was a `shared` variable. For more details about privatization, +see the "Data environment" section below. + +See `flang/test/Transforms/DoConcurrent/loop_nest_test.f90` for more examples +of what is and is not detected as a perfect loop nest. +