[flang][fir] Add MLIR op for `do concurrent` #130893

ergawy · 2025-03-12T04:54:22Z

Adds new MLIR ops to model do concurrent. In order to make do concurrent representation self-contained, a loop is modeled using 2 ops, one wrapper and one that contains the actual body of the loop. For example, a 2D do concurrent loop is modeled as follows:

  fir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    fir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -> i32
      fir.store %0 to %i : !fir.ref<i32>

      %1 = fir.convert %j_iv : (index) -> i32
      fir.store %1 to %j : !fir.ref<i32>
    }
  }

The fir.do_concurrent wrapper op encapsulates both the actual loop and the allocations required for the iteration variables. The fir.do_concurrent.loop op is a multi-dimensional op that contains the loop control and body. See the ops' docs for more info.

llvmbot · 2025-03-12T04:54:54Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Adds new MLIR ops to model do concurrent. In order to make do concurrent representation self-contained, a loop is modeled using 2 ops, one wrapper and one that contains the actual body of the loop. For example, a 2D do concurrent loop is modeled as follows:

  hlfir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    hlfir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -&gt; i32
      fir.store %0 to %i : !fir.ref&lt;i32&gt;

      %1 = fir.convert %j_iv : (index) -&gt; i32
      fir.store %1 to %j : !fir.ref&lt;i32&gt;
    }
  }

The hlfir.do_concurrent wrapper op encapsulates both the actual loop and the allocations required for the iteration variables. The hlfir.do_concurrent.loop op is a multi-dimensional op that contains the loop control and body. See the ops' docs for more info.

Full diff: https://github.com/llvm/llvm-project/pull/130893.diff

4 Files Affected:

(modified) flang/include/flang/Optimizer/HLFIR/HLFIROps.td (+116)
(modified) flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp (+163)
(added) flang/test/HLFIR/do_concurrent.fir (+92)
(modified) flang/test/HLFIR/invalid.fir (+95)

diff --git a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
index f69930d5b53b3..089c67af5d313 100644
--- a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
+++ b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
@@ -21,6 +21,7 @@ include "flang/Optimizer/Dialect/FIRAttr.td"
 include "flang/Optimizer/Dialect/FortranVariableInterface.td"
 include "mlir/Dialect/Arith/IR/ArithBase.td"
 include "mlir/Dialect/Arith/IR/ArithOpsInterfaces.td"
+include "mlir/Dialect/LLVMIR/LLVMAttrDefs.td"
 include "mlir/IR/BuiltinAttributes.td"
 
 // Base class for FIR operations.
@@ -1863,5 +1864,120 @@ def hlfir_EvaluateInMemoryOp : hlfir_Op<"eval_in_mem", [AttrSizedOperandSegments
   let hasVerifier = 1;
 }
 
+def hlfir_DoConcurrentOp : hlfir_Op<"do_concurrent", [SingleBlock]> {
+  let summary = "do concurrent loop wrapper";
+
+  let description = [{
+    A wrapper operation for the actual op modeling `do concurrent` loops:
+    `hlfir.do_concurrent.loop` (see op declaration below for more info about it).
+
+    The `hlfir.do_concurrent` wrapper op consists of one single-block region with
+    the following properties:
+    - The first ops in the region are responsible for allocating storage for the
+      loop's iteration variables. This is property is **not** enforced by the op
+      verifier, but expected to be respected when building the op.
+    - The terminator of the region is an instance of `hlfir.do_concurrent.loop`.
+
+    For example, a 2D loop nest would be represented as follows:
+    ```
+    hlfir.do_concurrent {
+      %i = fir.alloca i32
+      %j = fir.alloca i32
+      hlfir.do_concurrent.loop ...
+    }
+    ```
+  }];
+
+  let regions = (region SizedRegion<1>:$region);
+
+  let assemblyFormat = "$region attr-dict";
+  let hasVerifier = 1;
+}
+
+def hlfir_DoConcurrentLoopOp : hlfir_Op<"do_concurrent.loop",
+    [AttrSizedOperandSegments, DeclareOpInterfaceMethods<LoopLikeOpInterface>,
+     Terminator, NoTerminator, SingleBlock, ParentOneOf<["DoConcurrentOp"]>]> {
+  let summary = "do concurrent loop";
+
+  let description = [{
+    An operation that models a Fortran `do concurrent` loop's header and block.
+    This is a single-region single-block terminator op that is expected to
+    terminate the region of a `omp.do_concurrent` wrapper op.
+
+    This op borrows from both `scf.parallel` and `fir.do_loop` ops. Similar to
+    `scf.parallel`, a loop nest takes 3 groups of SSA values as operands that
+    represent the lower bounds, upper bounds, and steps. Similar to `fir.do_loop`
+    the op takes one additional group of SSA values to represent reductions.
+
+    The body region **does not** have a terminator.
+
+    For example, a 2D loop nest with 2 reductions (sum and max) would be
+    represented as follows:
+    ```
+    // The wrapper of the loop
+    hlfir.do_concurrent {
+      %i = fir.alloca i32
+      %j = fir.alloca i32
+
+      // The actual `do concurrent` loop
+      hlfir.do_concurrent.loop
+        (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st)
+        reduce(#fir.reduce_attr<add> -> %sum : !fir.ref<i32>,
+               #fir.reduce_attr<max> -> %max : !fir.ref<f32>) {
+
+        %0 = fir.convert %i_iv : (index) -> i32
+        fir.store %0 to %i : !fir.ref<i32>
+
+        %1 = fir.convert %j_iv : (index) -> i32
+        fir.store %1 to %j : !fir.ref<i32>
+
+        // ... loop body goes here ...
+      }
+    }
+    ```
+
+    Description of arguments:
+    - `lowerBound`: The group of SSA values for the nest's lower bounds.
+    - `upperBound`: The group of SSA values for the nest's upper bounds.
+    - `step`: The group of SSA values for the nest's steps.
+    - `reduceOperands`: The reduction SSA values, if any.
+    - `reduceAttrs`: Attributes to store reduction operations, if any.
+    - `loopAnnotation`: Loop metadata to be passed down the compiler pipeline to
+      LLVM.
+  }];
+
+  let arguments = (ins
+    Variadic<Index>:$lowerBound,
+    Variadic<Index>:$upperBound,
+    Variadic<Index>:$step,
+    Variadic<AnyType>:$reduceOperands,
+    OptionalAttr<ArrayAttr>:$reduceAttrs,
+    OptionalAttr<LoopAnnotationAttr>:$loopAnnotation
+  );
+
+  let regions = (region SizedRegion<1>:$region);
+
+  let hasCustomAssemblyFormat = 1;
+  let hasVerifier = 1;
+
+  let extraClassDeclaration = [{
+    /// Get Number of variadic operands
+    unsigned getNumOperands(unsigned segmentIdx) {
+      auto segments = (*this)->getAttrOfType<mlir::DenseI32ArrayAttr>(
+        getOperandSegmentSizeAttr());
+      return static_cast<unsigned>(segments[segmentIdx]);
+    }
+
+    // Get Number of reduction operands
+    unsigned getNumReduceOperands() {
+      return getNumOperands(3);
+    }
+
+    /// Does the operation hold operands for reduction variables
+    bool hasReduceOperands() {
+      return getNumReduceOperands() > 0;
+    }
+  }];
+}
 
 #endif // FORTRAN_DIALECT_HLFIR_OPS
diff --git a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
index 8851a3a7187b9..576dc601aff14 100644
--- a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
+++ b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
@@ -12,6 +12,7 @@
 
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 
+#include "flang/Optimizer/Dialect/FIRAttr.h"
 #include "flang/Optimizer/Dialect/FIROpsSupport.h"
 #include "flang/Optimizer/Dialect/FIRType.h"
 #include "flang/Optimizer/Dialect/Support/FIRContext.h"
@@ -2246,6 +2247,168 @@ llvm::LogicalResult hlfir::EvaluateInMemoryOp::verify() {
   return mlir::success();
 }
 
+//===----------------------------------------------------------------------===//
+// DoConcurrentOp
+//===----------------------------------------------------------------------===//
+
+llvm::LogicalResult hlfir::DoConcurrentOp::verify() {
+  mlir::Block *body = getBody();
+
+  if (body->empty())
+    return emitOpError("body cannot be empty");
+
+  if (!body->mightHaveTerminator() ||
+      !mlir::isa<hlfir::DoConcurrentLoopOp>(body->getTerminator()))
+    return emitOpError("must be terminated by 'hlfir.do_concurrent.loop'");
+
+  return mlir::success();
+}
+
+//===----------------------------------------------------------------------===//
+// DoConcurrentLoopOp
+//===----------------------------------------------------------------------===//
+
+mlir::ParseResult
+hlfir::DoConcurrentLoopOp::parse(mlir::OpAsmParser &parser,
+                                 mlir::OperationState &result) {
+  auto &builder = parser.getBuilder();
+  // Parse an opening `(` followed by induction variables followed by `)`
+  llvm::SmallVector<mlir::OpAsmParser::Argument, 4> ivs;
+  if (parser.parseArgumentList(ivs, mlir::OpAsmParser::Delimiter::Paren))
+    return mlir::failure();
+
+  // Parse loop bounds.
+  llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand, 4> lower;
+  if (parser.parseEqual() ||
+      parser.parseOperandList(lower, ivs.size(),
+                              mlir::OpAsmParser::Delimiter::Paren) ||
+      parser.resolveOperands(lower, builder.getIndexType(), result.operands))
+    return mlir::failure();
+
+  llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand, 4> upper;
+  if (parser.parseKeyword("to") ||
+      parser.parseOperandList(upper, ivs.size(),
+                              mlir::OpAsmParser::Delimiter::Paren) ||
+      parser.resolveOperands(upper, builder.getIndexType(), result.operands))
+    return mlir::failure();
+
+  // Parse step values.
+  llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand, 4> steps;
+  if (parser.parseKeyword("step") ||
+      parser.parseOperandList(steps, ivs.size(),
+                              mlir::OpAsmParser::Delimiter::Paren) ||
+      parser.resolveOperands(steps, builder.getIndexType(), result.operands))
+    return mlir::failure();
+
+  llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand> reduceOperands;
+  llvm::SmallVector<mlir::Type> reduceArgTypes;
+  if (succeeded(parser.parseOptionalKeyword("reduce"))) {
+    // Parse reduction attributes and variables.
+    llvm::SmallVector<fir::ReduceAttr> attributes;
+    if (failed(parser.parseCommaSeparatedList(
+            mlir::AsmParser::Delimiter::Paren, [&]() {
+              if (parser.parseAttribute(attributes.emplace_back()) ||
+                  parser.parseArrow() ||
+                  parser.parseOperand(reduceOperands.emplace_back()) ||
+                  parser.parseColonType(reduceArgTypes.emplace_back()))
+                return mlir::failure();
+              return mlir::success();
+            })))
+      return mlir::failure();
+    // Resolve input operands.
+    for (auto operand_type : llvm::zip(reduceOperands, reduceArgTypes))
+      if (parser.resolveOperand(std::get<0>(operand_type),
+                                std::get<1>(operand_type), result.operands))
+        return mlir::failure();
+    llvm::SmallVector<mlir::Attribute> arrayAttr(attributes.begin(),
+                                                 attributes.end());
+    result.addAttribute(getReduceAttrsAttrName(result.name),
+                        builder.getArrayAttr(arrayAttr));
+  }
+
+  // Now parse the body.
+  mlir::Region *body = result.addRegion();
+  for (auto &iv : ivs)
+    iv.type = builder.getIndexType();
+  if (parser.parseRegion(*body, ivs))
+    return mlir::failure();
+
+  // Set `operandSegmentSizes` attribute.
+  result.addAttribute(DoConcurrentLoopOp::getOperandSegmentSizeAttr(),
+                      builder.getDenseI32ArrayAttr(
+                          {static_cast<int32_t>(lower.size()),
+                           static_cast<int32_t>(upper.size()),
+                           static_cast<int32_t>(steps.size()),
+                           static_cast<int32_t>(reduceOperands.size())}));
+
+  // Parse attributes.
+  if (parser.parseOptionalAttrDict(result.attributes))
+    return mlir::failure();
+
+  return mlir::success();
+}
+
+void hlfir::DoConcurrentLoopOp::print(mlir::OpAsmPrinter &p) {
+  p << " (" << getBody()->getArguments() << ") = (" << getLowerBound()
+    << ") to (" << getUpperBound() << ") step (" << getStep() << ")";
+
+  if (hasReduceOperands()) {
+    p << " reduce(";
+    auto attrs = getReduceAttrsAttr();
+    auto operands = getReduceOperands();
+    llvm::interleaveComma(llvm::zip(attrs, operands), p, [&](auto it) {
+      p << std::get<0>(it) << " -> " << std::get<1>(it) << " : "
+        << std::get<1>(it).getType();
+    });
+    p << ')';
+  }
+
+  p << ' ';
+  p.printRegion(getRegion(), /*printEntryBlockArgs=*/false);
+  p.printOptionalAttrDict(
+      (*this)->getAttrs(),
+      /*elidedAttrs=*/{DoConcurrentLoopOp::getOperandSegmentSizeAttr(),
+                       DoConcurrentLoopOp::getReduceAttrsAttrName()});
+}
+
+llvm::SmallVector<mlir::Region *> hlfir::DoConcurrentLoopOp::getLoopRegions() {
+  return {&getRegion()};
+}
+
+llvm::LogicalResult hlfir::DoConcurrentLoopOp::verify() {
+  mlir::Operation::operand_range lbValues = getLowerBound();
+  mlir::Operation::operand_range ubValues = getUpperBound();
+  mlir::Operation::operand_range stepValues = getStep();
+
+  if (lbValues.empty())
+    return emitOpError(
+        "needs at least one tuple element for lowerBound, upperBound and step");
+
+  if (lbValues.size() != ubValues.size() ||
+      ubValues.size() != stepValues.size())
+    return emitOpError(
+        "different number of tuple elements for lowerBound, upperBound or step");
+
+  // Check that the body defines the same number of block arguments as the
+  // number of tuple elements in step.
+  mlir::Block *body = getBody();
+  if (body->getNumArguments() != stepValues.size())
+    return emitOpError() << "expects the same number of induction variables: "
+                         << body->getNumArguments()
+                         << " as bound and step values: " << stepValues.size();
+  for (auto arg : body->getArguments())
+    if (!arg.getType().isIndex())
+      return emitOpError(
+          "expects arguments for the induction variable to be of index type");
+
+  auto reduceAttrs = getReduceAttrsAttr();
+  if (getNumReduceOperands() != (reduceAttrs ? reduceAttrs.size() : 0))
+    return emitOpError(
+        "mismatch in number of reduction variables and reduction attributes");
+
+  return mlir::success();
+}
+
 #include "flang/Optimizer/HLFIR/HLFIROpInterfaces.cpp.inc"
 #define GET_OP_CLASSES
 #include "flang/Optimizer/HLFIR/HLFIREnums.cpp.inc"
diff --git a/flang/test/HLFIR/do_concurrent.fir b/flang/test/HLFIR/do_concurrent.fir
new file mode 100644
index 0000000000000..aef9db2236a57
--- /dev/null
+++ b/flang/test/HLFIR/do_concurrent.fir
@@ -0,0 +1,92 @@
+// Test hlfir.do_concurrent operation parse, verify (no errors), and unparse
+
+// RUN: fir-opt %s | fir-opt | FileCheck %s
+
+func.func @dc_1d(%i_lb: index, %i_ub: index, %i_st: index) {
+  hlfir.do_concurrent {
+    %i = fir.alloca i32
+    hlfir.do_concurrent.loop (%i_iv) = (%i_lb) to (%i_ub) step (%i_st) {
+      %0 = fir.convert %i_iv : (index) -> i32
+      fir.store %0 to %i : !fir.ref<i32>
+    }
+  }
+  return
+}
+
+// CHECK-LABEL: func.func @dc_1d
+// CHECK-SAME:    (%[[I_LB:.*]]: index, %[[I_UB:.*]]: index, %[[I_ST:.*]]: index)
+// CHECK:         hlfir.do_concurrent {
+// CHECK:           %[[I:.*]] = fir.alloca i32
+// CHECK:           hlfir.do_concurrent.loop (%[[I_IV:.*]]) = (%[[I_LB]]) to (%[[I_UB]]) step (%[[I_ST]]) {
+// CHECK:             %[[I_IV_CVT:.*]] = fir.convert %[[I_IV]] : (index) -> i32
+// CHECK:             fir.store %[[I_IV_CVT]] to %[[I]] : !fir.ref<i32>
+// CHECK:           }
+// CHECK:         }
+
+func.func @dc_2d(%i_lb: index, %i_ub: index, %i_st: index,
+                 %j_lb: index, %j_ub: index, %j_st: index) {
+  hlfir.do_concurrent {
+    %i = fir.alloca i32
+    %j = fir.alloca i32
+    hlfir.do_concurrent.loop
+      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
+      %0 = fir.convert %i_iv : (index) -> i32
+      fir.store %0 to %i : !fir.ref<i32>
+
+      %1 = fir.convert %j_iv : (index) -> i32
+      fir.store %1 to %j : !fir.ref<i32>
+    }
+  }
+  return
+}
+
+// CHECK-LABEL: func.func @dc_2d
+// CHECK-SAME:    (%[[I_LB:.*]]: index, %[[I_UB:.*]]: index, %[[I_ST:.*]]: index, %[[J_LB:.*]]: index, %[[J_UB:.*]]: index, %[[J_ST:.*]]: index)
+// CHECK:         hlfir.do_concurrent {
+// CHECK:           %[[I:.*]] = fir.alloca i32
+// CHECK:           %[[J:.*]] = fir.alloca i32
+// CHECK:           hlfir.do_concurrent.loop
+// CHECK-SAME:        (%[[I_IV:.*]], %[[J_IV:.*]]) = (%[[I_LB]], %[[J_LB]]) to (%[[I_UB]], %[[J_UB]]) step (%[[I_ST]], %[[J_ST]]) {
+// CHECK:             %[[I_IV_CVT:.*]] = fir.convert %[[I_IV]] : (index) -> i32
+// CHECK:             fir.store %[[I_IV_CVT]] to %[[I]] : !fir.ref<i32>
+// CHECK:             %[[J_IV_CVT:.*]] = fir.convert %[[J_IV]] : (index) -> i32
+// CHECK:             fir.store %[[J_IV_CVT]] to %[[J]] : !fir.ref<i32>
+// CHECK:           }
+// CHECK:         }
+
+func.func @dc_2d_reduction(%i_lb: index, %i_ub: index, %i_st: index,
+                 %j_lb: index, %j_ub: index, %j_st: index) {
+  %sum = fir.alloca i32
+
+  hlfir.do_concurrent {
+    %i = fir.alloca i32
+    %j = fir.alloca i32
+    hlfir.do_concurrent.loop
+      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) 
+      reduce(#fir.reduce_attr<add> -> %sum : !fir.ref<i32>) {
+      %0 = fir.convert %i_iv : (index) -> i32
+      fir.store %0 to %i : !fir.ref<i32>
+
+      %1 = fir.convert %j_iv : (index) -> i32
+      fir.store %1 to %j : !fir.ref<i32>
+    }
+  }
+  return
+}
+
+// CHECK-LABEL: func.func @dc_2d_reduction
+// CHECK-SAME:    (%[[I_LB:.*]]: index, %[[I_UB:.*]]: index, %[[I_ST:.*]]: index, %[[J_LB:.*]]: index, %[[J_UB:.*]]: index, %[[J_ST:.*]]: index)
+
+// CHECK:         %[[SUM:.*]] = fir.alloca i32
+
+// CHECK:         hlfir.do_concurrent {
+// CHECK:           %[[I:.*]] = fir.alloca i32
+// CHECK:           %[[J:.*]] = fir.alloca i32
+// CHECK:           hlfir.do_concurrent.loop
+// CHECK-SAME:        (%[[I_IV:.*]], %[[J_IV:.*]]) = (%[[I_LB]], %[[J_LB]]) to (%[[I_UB]], %[[J_UB]]) step (%[[I_ST]], %[[J_ST]]) reduce(#fir.reduce_attr<add> -> %[[SUM]] : !fir.ref<i32>) {
+// CHECK:             %[[I_IV_CVT:.*]] = fir.convert %[[I_IV]] : (index) -> i32
+// CHECK:             fir.store %[[I_IV_CVT]] to %[[I]] : !fir.ref<i32>
+// CHECK:             %[[J_IV_CVT:.*]] = fir.convert %[[J_IV]] : (index) -> i32
+// CHECK:             fir.store %[[J_IV_CVT]] to %[[J]] : !fir.ref<i32>
+// CHECK:           }
+// CHECK:         }
diff --git a/flang/test/HLFIR/invalid.fir b/flang/test/HLFIR/invalid.fir
index d61efe0062e69..e14284f916bd9 100644
--- a/flang/test/HLFIR/invalid.fir
+++ b/flang/test/HLFIR/invalid.fir
@@ -1555,3 +1555,98 @@ func.func @bad_reshape(%arg0: !hlfir.expr<1x!fir.char<1,2>>, %arg1: !hlfir.expr<
   %0 = hlfir.reshape %arg0 %arg1 pad %arg2 : (!hlfir.expr<1x!fir.char<1,2>>, !hlfir.expr<1xi32>, !hlfir.expr<1x!fir.char<2,?>>) -> !hlfir.expr<?x!fir.char<1,?>>
   return
 }
+
+// -----
+
+func.func @empty_dc_wrapper_body() {
+  // expected-error@+1 {{'hlfir.do_concurrent' op expects a non-empty block}}
+  hlfir.do_concurrent {
+  }
+  return
+}
+
+// -----
+
+func.func @dc_wrong_terminator() {
+  // expected-error@+1 {{'hlfir.do_concurrent' op must be terminated by 'hlfir.do_concurrent.loop'}}
+  hlfir.do_concurrent {
+    llvm.return
+  }
+  return
+}
+
+// -----
+
+func.func @dc_0d() {
+   // expected-error@+2 {{'hlfir.do_concurrent.loop' op needs at least one tuple element for lowerBound, upperBound and step}}
+  hlfir.do_concurrent {
+    hlfir.do_concurrent.loop () = () to () step () {
+      %tmp = fir.alloca i32
+    }
+  }
+  return
+}
+
+// -----
+
+func.func @dc_invalid_parent(%arg0: index, %arg1: index) {
+  // expected-error@+1 {{'hlfir.do_concurrent.loop' op expects parent op 'hlfir.do_concurrent'}}
+  "hlfir.do_concurrent.loop"(%arg0, %arg1) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>}> ({
+  ^bb0(%arg2: index):
+     %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+  }) : (index, index) -> ()
+  return
+}
+
+// -----
+
+func.func @dc_invalid_control(%arg0: index, %arg1: index) {
+  // expected-error@+2 {{'hlfir.do_concurrent.loop' op different number of tuple elements for lowerBound, upperBound or step}}
+  hlfir.do_concurrent {
+    "hlfir.do_concurrent.loop"(%arg0, %arg1) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>}> ({
+    ^bb0(%arg2: index):
+      %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+    }) : (index, index) -> ()
+  }
+  return
+}
+
+// -----
+
+func.func @dc_invalid_ind_var(%arg0: index, %arg1: index) {
+  // expected-error@+2 {{'hlfir.do_concurrent.loop' op expects the same number of induction variables: 2 as bound and step values: 1}}
+  hlfir.do_concurrent {
+    "hlfir.do_concurrent.loop"(%arg0, %arg1, %arg0) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({
+    ^bb0(%arg3: index, %arg4: index):
+      %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+    }) : (index, index, index) -> ()
+  }
+  return
+}
+
+// -----
+
+func.func @dc_invalid_ind_var_type(%arg0: index, %arg1: index) {
+  // expected-error@+2 {{'hlfir.do_concurrent.loop' op expects arguments for the induction variable to be of index type}}
+  hlfir.do_concurrent {
+    "hlfir.do_concurrent.loop"(%arg0, %arg1, %arg0) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({
+    ^bb0(%arg3: i32):
+      %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+    }) : (index, index, index) -> ()
+  }
+  return
+}
+
+// -----
+
+func.func @dc_invalid_reduction(%arg0: index, %arg1: index) {
+  %sum = fir.alloca i32
+  // expected-error@+2 {{'hlfir.do_concurrent.loop' op mismatch in number of reduction variables and reduction attributes}}
+  hlfir.do_concurrent {
+    "hlfir.do_concurrent.loop"(%arg0, %arg1, %arg0, %sum) <{operandSegmentSizes = array<i32: 1, 1, 1, 1>}> ({
+    ^bb0(%arg3: index):
+      %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+    }) : (index, index, index, !fir.ref<i32>) -> ()
+  }
+  return
+}

github-actions · 2025-03-12T04:57:45Z

✅ With the latest revision this PR passed the C/C++ code formatter.

tblah

Overall LGTM. Please wait for somebody else to have a look before merging.

flang/include/flang/Optimizer/HLFIR/HLFIROps.td

jeanPerier

Thanks!

The approach looks good to me. A few questions:

Why HFLIR and not FIR?

Putting it in HLFIR forces lowering it when moving from HLFIR to FIR, while it seems there is no deep requirement for this operation to be in HLFIR. The main reason for things to be in HLFIR is when they can manipulate hlfir.expr, which makes the so called "bufferization" the natural transition between FIR and HLFIR, and I am not sure it is best to correlate the hlfir.do_concurrent to xxx with bufferization (although I understand why you would want to run hlfir.do_concurrent to xxx early in the pipeline for some xxx, and that is OK, but I think it is best to give freedom to maintain fir.do_concurrent up to when we lower fir.do_loop).

What about locality specifiers?

I think you mentioned in the RFC that you intend to add them later to these ops, I just want to check that I got that part right.

jeanPerier · 2025-03-12T15:54:17Z

flang/include/flang/Optimizer/HLFIR/HLFIROps.td

@@ -1863,5 +1864,108 @@ def hlfir_EvaluateInMemoryOp : hlfir_Op<"eval_in_mem", [AttrSizedOperandSegments
  let hasVerifier = 1;
 }

+def hlfir_DoConcurrentOp : hlfir_Op<"do_concurrent", [SingleBlock]> {


I think this should have the AutomaticAllocationScope interface to "pin" the loop indices allocas to it.

Right, done!

ergawy · 2025-03-13T04:49:31Z

* Why HFLIR and not FIR?
Putting it in HLFIR forces lowering it when moving from HLFIR to FIR, while it seems there is no deep requirement for this operation to be in HLFIR. The main reason for things to be in HLFIR is when they can manipulate hlfir.expr, which makes the so called "bufferization" the natural transition between FIR and HLFIR, and I am not sure it is best to correlate the hlfir.do_concurrent to xxx with bufferization (although I understand why you would want to run hlfir.do_concurrent to xxx early in the pipeline for some xxx, and that is OK, but I think it is best to give freedom to maintain fir.do_concurrent up to when we lower fir.do_loop).

Thanks for that context, very useful for me. This was suggested by @kiranchandramohan but it also made sense to me; since, in my mind, hlfir.do_concurrent is a high-level model of do concurrent loops that can be lowered to different targets including the lower level sibling fir.do_loop. Was HLFIR designed with the hlfir.expr and bufferization points being specifically in mind? Or is this just how the dialect has been used so far? If it is the latter, I think it won't be that bad to stretch the purpose/definition of the dialect to: "ops that are closer to Fortran source than the ones in FIR dialect". I am not too attached to that decision though, I think you guys will have better informed opinions here.

* What about locality specifiers?
I think you mentioned in the RFC that you intend to add them later to these ops, I just want to check that I got that part right.

Yes, I want to reuse what we have in OpenMP regarding locality specifiers for the Fortran dialects as well. See: #128148. However, I am postponing this until I implement the do_concurrent op first. #128148 is just PoC, the final aim is to extract the OpenMP table-gen records used in the PoC into a "data environment dialect" used for OpenMP, Fortran, and maybe OpenACC as well. Since I did the PoC with fir.do_loop in mind, I will need to rethink it after finishing up do_concurent.

kiranchandramohan · 2025-03-14T11:01:27Z

This was suggested by @kiranchandramohan

It was a suggestion from my side. Please follow @jeanPerier's direction if he has a strong opinion.

ergawy · 2025-03-14T11:29:30Z

This was suggested by @kiranchandramohan

It was a suggestion from my side. Please follow @jeanPerier's direction if he has a strong opinion.

👍 @jeanPerier will move to fir then since that seems to be consistent of the current dialects definitions. Let me know if you had other thoughts.

jeanPerier · 2025-03-14T12:11:32Z

This was suggested by @kiranchandramohan

It was a suggestion from my side. Please follow @jeanPerier's direction if he has a strong opinion.

👍 @jeanPerier will move to fir then since that seems to be consistent of the current dialects definitions. Let me know if you had other thoughts.

Thanks, that just seems the most flexible approach to me.
I do understand Kiran's advice based on the fact that HLFIR is meant to be closer from Fortran representation than FIR, and maybe I should clarify more the HLFIR vs FIR somewhere. I really think the hlfir.expr type is the corner stone of HLFIR, and if that type cannot be an operand or a result of that operation, it likely better belong to FIR. HLFIR can be seen as "tensor operations for FIR".

Adds new MLIR ops to model `do concurrent`. In order to make `do concurrent` representation self-contained, a loop is modeled using 2 ops, one wrapper and one that contains the actual body of the loop. For example, a 2D `do concurrent` loop is modeled as follows: ```mlir hlfir.do_concurrent { %i = fir.alloca i32 %j = fir.alloca i32 hlfir.do_concurrent.loop (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) { %0 = fir.convert %i_iv : (index) -> i32 fir.store %0 to %i : !fir.ref<i32> %1 = fir.convert %j_iv : (index) -> i32 fir.store %1 to %j : !fir.ref<i32> } } ``` The `hlfir.do_concurrent` wrapper op encapsulates both the actual loop and the allocations required for the iteration variables. The `hlfir.do_concurrent.loop` op is a multi-dimensional op that contains the loop control and body. See the ops' docs for more info.

ergawy · 2025-03-17T06:17:10Z

Thanks all for the comments! Moved the ops to fir. Please take a look when you have time.

jeanPerier

Thanks, LGTM

ergawy requested review from jeanPerier and vzakhari March 12, 2025 04:54

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Mar 12, 2025

ergawy requested a review from kiranchandramohan March 12, 2025 04:54

ergawy requested a review from clementval March 12, 2025 04:56

ergawy force-pushed the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch from 0490357 to dd56911 Compare March 12, 2025 05:04

ergawy requested a review from tblah March 12, 2025 05:29

tblah approved these changes Mar 12, 2025

View reviewed changes

flang/include/flang/Optimizer/HLFIR/HLFIROps.td Outdated Show resolved Hide resolved

ergawy force-pushed the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch from dd56911 to dcc9686 Compare March 12, 2025 10:49

jeanPerier reviewed Mar 12, 2025

View reviewed changes

ergawy force-pushed the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch from dcc9686 to 2072558 Compare March 13, 2025 05:01

ergawy added 4 commits March 17, 2025 00:44

handle review comments

e3db9f0

handle review comments

1c7a3e9

Move ops to fir

1dc58b5

ergawy force-pushed the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch from 2072558 to 1dc58b5 Compare March 17, 2025 06:16

ergawy changed the title ~~[flang][hlfir] Add MLIR op for do concurrent~~ [flang][fir] Add MLIR op for do concurrent Mar 17, 2025

jeanPerier approved these changes Mar 18, 2025

View reviewed changes

ergawy merged commit 1094ffc into main Mar 18, 2025
12 checks passed

ergawy deleted the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch March 18, 2025 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang][fir] Add MLIR op for `do concurrent` #130893

[flang][fir] Add MLIR op for `do concurrent` #130893

Uh oh!

ergawy commented Mar 12, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 12, 2025

Uh oh!

github-actions bot commented Mar 12, 2025 •

edited

Loading

Uh oh!

tblah left a comment

Uh oh!

Uh oh!

jeanPerier left a comment

Uh oh!

jeanPerier Mar 12, 2025

Uh oh!

ergawy Mar 13, 2025

Uh oh!

ergawy commented Mar 13, 2025 •

edited

Loading

Uh oh!

kiranchandramohan commented Mar 14, 2025

Uh oh!

ergawy commented Mar 14, 2025

Uh oh!

jeanPerier commented Mar 14, 2025

Uh oh!

ergawy commented Mar 17, 2025

Uh oh!

jeanPerier left a comment

Uh oh!

Uh oh!

Uh oh!

[flang][fir] Add MLIR op for do concurrent #130893

[flang][fir] Add MLIR op for do concurrent #130893

Uh oh!

Conversation

ergawy commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 12, 2025

Uh oh!

github-actions bot commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeanPerier left a comment

Choose a reason for hiding this comment

Uh oh!

jeanPerier Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

ergawy Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

ergawy commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiranchandramohan commented Mar 14, 2025

Uh oh!

ergawy commented Mar 14, 2025

Uh oh!

jeanPerier commented Mar 14, 2025

Uh oh!

ergawy commented Mar 17, 2025

Uh oh!

jeanPerier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[flang][fir] Add MLIR op for `do concurrent` #130893

[flang][fir] Add MLIR op for `do concurrent` #130893

ergawy commented Mar 12, 2025 •

edited

Loading

github-actions bot commented Mar 12, 2025 •

edited

Loading

ergawy commented Mar 13, 2025 •

edited

Loading