Skip to content

[flang][fir] Add MLIR op for do concurrent #130893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 18, 2025

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Mar 12, 2025

Adds new MLIR ops to model do concurrent. In order to make do concurrent representation self-contained, a loop is modeled using 2 ops, one wrapper and one that contains the actual body of the loop. For example, a 2D do concurrent loop is modeled as follows:

  fir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    fir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -> i32
      fir.store %0 to %i : !fir.ref<i32>

      %1 = fir.convert %j_iv : (index) -> i32
      fir.store %1 to %j : !fir.ref<i32>
    }
  }

The fir.do_concurrent wrapper op encapsulates both the actual loop and the allocations required for the iteration variables. The fir.do_concurrent.loop op is a multi-dimensional op that contains the loop control and body. See the ops' docs for more info.

@ergawy ergawy requested review from jeanPerier and vzakhari March 12, 2025 04:54
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Mar 12, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 12, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Adds new MLIR ops to model do concurrent. In order to make do concurrent representation self-contained, a loop is modeled using 2 ops, one wrapper and one that contains the actual body of the loop. For example, a 2D do concurrent loop is modeled as follows:

  hlfir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    hlfir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -&gt; i32
      fir.store %0 to %i : !fir.ref&lt;i32&gt;

      %1 = fir.convert %j_iv : (index) -&gt; i32
      fir.store %1 to %j : !fir.ref&lt;i32&gt;
    }
  }

The hlfir.do_concurrent wrapper op encapsulates both the actual loop and the allocations required for the iteration variables. The hlfir.do_concurrent.loop op is a multi-dimensional op that contains the loop control and body. See the ops' docs for more info.


Full diff: https://github.com/llvm/llvm-project/pull/130893.diff

4 Files Affected:

  • (modified) flang/include/flang/Optimizer/HLFIR/HLFIROps.td (+116)
  • (modified) flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp (+163)
  • (added) flang/test/HLFIR/do_concurrent.fir (+92)
  • (modified) flang/test/HLFIR/invalid.fir (+95)
diff --git a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
index f69930d5b53b3..089c67af5d313 100644
--- a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
+++ b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
@@ -21,6 +21,7 @@ include "flang/Optimizer/Dialect/FIRAttr.td"
 include "flang/Optimizer/Dialect/FortranVariableInterface.td"
 include "mlir/Dialect/Arith/IR/ArithBase.td"
 include "mlir/Dialect/Arith/IR/ArithOpsInterfaces.td"
+include "mlir/Dialect/LLVMIR/LLVMAttrDefs.td"
 include "mlir/IR/BuiltinAttributes.td"
 
 // Base class for FIR operations.
@@ -1863,5 +1864,120 @@ def hlfir_EvaluateInMemoryOp : hlfir_Op<"eval_in_mem", [AttrSizedOperandSegments
   let hasVerifier = 1;
 }
 
+def hlfir_DoConcurrentOp : hlfir_Op<"do_concurrent", [SingleBlock]> {
+  let summary = "do concurrent loop wrapper";
+
+  let description = [{
+    A wrapper operation for the actual op modeling `do concurrent` loops:
+    `hlfir.do_concurrent.loop` (see op declaration below for more info about it).
+
+    The `hlfir.do_concurrent` wrapper op consists of one single-block region with
+    the following properties:
+    - The first ops in the region are responsible for allocating storage for the
+      loop's iteration variables. This is property is **not** enforced by the op
+      verifier, but expected to be respected when building the op.
+    - The terminator of the region is an instance of `hlfir.do_concurrent.loop`.
+
+    For example, a 2D loop nest would be represented as follows:
+    ```
+    hlfir.do_concurrent {
+      %i = fir.alloca i32
+      %j = fir.alloca i32
+      hlfir.do_concurrent.loop ...
+    }
+    ```
+  }];
+
+  let regions = (region SizedRegion<1>:$region);
+
+  let assemblyFormat = "$region attr-dict";
+  let hasVerifier = 1;
+}
+
+def hlfir_DoConcurrentLoopOp : hlfir_Op<"do_concurrent.loop",
+    [AttrSizedOperandSegments, DeclareOpInterfaceMethods<LoopLikeOpInterface>,
+     Terminator, NoTerminator, SingleBlock, ParentOneOf<["DoConcurrentOp"]>]> {
+  let summary = "do concurrent loop";
+
+  let description = [{
+    An operation that models a Fortran `do concurrent` loop's header and block.
+    This is a single-region single-block terminator op that is expected to
+    terminate the region of a `omp.do_concurrent` wrapper op.
+
+    This op borrows from both `scf.parallel` and `fir.do_loop` ops. Similar to
+    `scf.parallel`, a loop nest takes 3 groups of SSA values as operands that
+    represent the lower bounds, upper bounds, and steps. Similar to `fir.do_loop`
+    the op takes one additional group of SSA values to represent reductions.
+
+    The body region **does not** have a terminator.
+
+    For example, a 2D loop nest with 2 reductions (sum and max) would be
+    represented as follows:
+    ```
+    // The wrapper of the loop
+    hlfir.do_concurrent {
+      %i = fir.alloca i32
+      %j = fir.alloca i32
+
+      // The actual `do concurrent` loop
+      hlfir.do_concurrent.loop
+        (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st)
+        reduce(#fir.reduce_attr<add> -> %sum : !fir.ref<i32>,
+               #fir.reduce_attr<max> -> %max : !fir.ref<f32>) {
+
+        %0 = fir.convert %i_iv : (index) -> i32
+        fir.store %0 to %i : !fir.ref<i32>
+
+        %1 = fir.convert %j_iv : (index) -> i32
+        fir.store %1 to %j : !fir.ref<i32>
+
+        // ... loop body goes here ...
+      }
+    }
+    ```
+
+    Description of arguments:
+    - `lowerBound`: The group of SSA values for the nest's lower bounds.
+    - `upperBound`: The group of SSA values for the nest's upper bounds.
+    - `step`: The group of SSA values for the nest's steps.
+    - `reduceOperands`: The reduction SSA values, if any.
+    - `reduceAttrs`: Attributes to store reduction operations, if any.
+    - `loopAnnotation`: Loop metadata to be passed down the compiler pipeline to
+      LLVM.
+  }];
+
+  let arguments = (ins
+    Variadic<Index>:$lowerBound,
+    Variadic<Index>:$upperBound,
+    Variadic<Index>:$step,
+    Variadic<AnyType>:$reduceOperands,
+    OptionalAttr<ArrayAttr>:$reduceAttrs,
+    OptionalAttr<LoopAnnotationAttr>:$loopAnnotation
+  );
+
+  let regions = (region SizedRegion<1>:$region);
+
+  let hasCustomAssemblyFormat = 1;
+  let hasVerifier = 1;
+
+  let extraClassDeclaration = [{
+    /// Get Number of variadic operands
+    unsigned getNumOperands(unsigned segmentIdx) {
+      auto segments = (*this)->getAttrOfType<mlir::DenseI32ArrayAttr>(
+        getOperandSegmentSizeAttr());
+      return static_cast<unsigned>(segments[segmentIdx]);
+    }
+
+    // Get Number of reduction operands
+    unsigned getNumReduceOperands() {
+      return getNumOperands(3);
+    }
+
+    /// Does the operation hold operands for reduction variables
+    bool hasReduceOperands() {
+      return getNumReduceOperands() > 0;
+    }
+  }];
+}
 
 #endif // FORTRAN_DIALECT_HLFIR_OPS
diff --git a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
index 8851a3a7187b9..576dc601aff14 100644
--- a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
+++ b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
@@ -12,6 +12,7 @@
 
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 
+#include "flang/Optimizer/Dialect/FIRAttr.h"
 #include "flang/Optimizer/Dialect/FIROpsSupport.h"
 #include "flang/Optimizer/Dialect/FIRType.h"
 #include "flang/Optimizer/Dialect/Support/FIRContext.h"
@@ -2246,6 +2247,168 @@ llvm::LogicalResult hlfir::EvaluateInMemoryOp::verify() {
   return mlir::success();
 }
 
+//===----------------------------------------------------------------------===//
+// DoConcurrentOp
+//===----------------------------------------------------------------------===//
+
+llvm::LogicalResult hlfir::DoConcurrentOp::verify() {
+  mlir::Block *body = getBody();
+
+  if (body->empty())
+    return emitOpError("body cannot be empty");
+
+  if (!body->mightHaveTerminator() ||
+      !mlir::isa<hlfir::DoConcurrentLoopOp>(body->getTerminator()))
+    return emitOpError("must be terminated by 'hlfir.do_concurrent.loop'");
+
+  return mlir::success();
+}
+
+//===----------------------------------------------------------------------===//
+// DoConcurrentLoopOp
+//===----------------------------------------------------------------------===//
+
+mlir::ParseResult
+hlfir::DoConcurrentLoopOp::parse(mlir::OpAsmParser &parser,
+                                 mlir::OperationState &result) {
+  auto &builder = parser.getBuilder();
+  // Parse an opening `(` followed by induction variables followed by `)`
+  llvm::SmallVector<mlir::OpAsmParser::Argument, 4> ivs;
+  if (parser.parseArgumentList(ivs, mlir::OpAsmParser::Delimiter::Paren))
+    return mlir::failure();
+
+  // Parse loop bounds.
+  llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand, 4> lower;
+  if (parser.parseEqual() ||
+      parser.parseOperandList(lower, ivs.size(),
+                              mlir::OpAsmParser::Delimiter::Paren) ||
+      parser.resolveOperands(lower, builder.getIndexType(), result.operands))
+    return mlir::failure();
+
+  llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand, 4> upper;
+  if (parser.parseKeyword("to") ||
+      parser.parseOperandList(upper, ivs.size(),
+                              mlir::OpAsmParser::Delimiter::Paren) ||
+      parser.resolveOperands(upper, builder.getIndexType(), result.operands))
+    return mlir::failure();
+
+  // Parse step values.
+  llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand, 4> steps;
+  if (parser.parseKeyword("step") ||
+      parser.parseOperandList(steps, ivs.size(),
+                              mlir::OpAsmParser::Delimiter::Paren) ||
+      parser.resolveOperands(steps, builder.getIndexType(), result.operands))
+    return mlir::failure();
+
+  llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand> reduceOperands;
+  llvm::SmallVector<mlir::Type> reduceArgTypes;
+  if (succeeded(parser.parseOptionalKeyword("reduce"))) {
+    // Parse reduction attributes and variables.
+    llvm::SmallVector<fir::ReduceAttr> attributes;
+    if (failed(parser.parseCommaSeparatedList(
+            mlir::AsmParser::Delimiter::Paren, [&]() {
+              if (parser.parseAttribute(attributes.emplace_back()) ||
+                  parser.parseArrow() ||
+                  parser.parseOperand(reduceOperands.emplace_back()) ||
+                  parser.parseColonType(reduceArgTypes.emplace_back()))
+                return mlir::failure();
+              return mlir::success();
+            })))
+      return mlir::failure();
+    // Resolve input operands.
+    for (auto operand_type : llvm::zip(reduceOperands, reduceArgTypes))
+      if (parser.resolveOperand(std::get<0>(operand_type),
+                                std::get<1>(operand_type), result.operands))
+        return mlir::failure();
+    llvm::SmallVector<mlir::Attribute> arrayAttr(attributes.begin(),
+                                                 attributes.end());
+    result.addAttribute(getReduceAttrsAttrName(result.name),
+                        builder.getArrayAttr(arrayAttr));
+  }
+
+  // Now parse the body.
+  mlir::Region *body = result.addRegion();
+  for (auto &iv : ivs)
+    iv.type = builder.getIndexType();
+  if (parser.parseRegion(*body, ivs))
+    return mlir::failure();
+
+  // Set `operandSegmentSizes` attribute.
+  result.addAttribute(DoConcurrentLoopOp::getOperandSegmentSizeAttr(),
+                      builder.getDenseI32ArrayAttr(
+                          {static_cast<int32_t>(lower.size()),
+                           static_cast<int32_t>(upper.size()),
+                           static_cast<int32_t>(steps.size()),
+                           static_cast<int32_t>(reduceOperands.size())}));
+
+  // Parse attributes.
+  if (parser.parseOptionalAttrDict(result.attributes))
+    return mlir::failure();
+
+  return mlir::success();
+}
+
+void hlfir::DoConcurrentLoopOp::print(mlir::OpAsmPrinter &p) {
+  p << " (" << getBody()->getArguments() << ") = (" << getLowerBound()
+    << ") to (" << getUpperBound() << ") step (" << getStep() << ")";
+
+  if (hasReduceOperands()) {
+    p << " reduce(";
+    auto attrs = getReduceAttrsAttr();
+    auto operands = getReduceOperands();
+    llvm::interleaveComma(llvm::zip(attrs, operands), p, [&](auto it) {
+      p << std::get<0>(it) << " -> " << std::get<1>(it) << " : "
+        << std::get<1>(it).getType();
+    });
+    p << ')';
+  }
+
+  p << ' ';
+  p.printRegion(getRegion(), /*printEntryBlockArgs=*/false);
+  p.printOptionalAttrDict(
+      (*this)->getAttrs(),
+      /*elidedAttrs=*/{DoConcurrentLoopOp::getOperandSegmentSizeAttr(),
+                       DoConcurrentLoopOp::getReduceAttrsAttrName()});
+}
+
+llvm::SmallVector<mlir::Region *> hlfir::DoConcurrentLoopOp::getLoopRegions() {
+  return {&getRegion()};
+}
+
+llvm::LogicalResult hlfir::DoConcurrentLoopOp::verify() {
+  mlir::Operation::operand_range lbValues = getLowerBound();
+  mlir::Operation::operand_range ubValues = getUpperBound();
+  mlir::Operation::operand_range stepValues = getStep();
+
+  if (lbValues.empty())
+    return emitOpError(
+        "needs at least one tuple element for lowerBound, upperBound and step");
+
+  if (lbValues.size() != ubValues.size() ||
+      ubValues.size() != stepValues.size())
+    return emitOpError(
+        "different number of tuple elements for lowerBound, upperBound or step");
+
+  // Check that the body defines the same number of block arguments as the
+  // number of tuple elements in step.
+  mlir::Block *body = getBody();
+  if (body->getNumArguments() != stepValues.size())
+    return emitOpError() << "expects the same number of induction variables: "
+                         << body->getNumArguments()
+                         << " as bound and step values: " << stepValues.size();
+  for (auto arg : body->getArguments())
+    if (!arg.getType().isIndex())
+      return emitOpError(
+          "expects arguments for the induction variable to be of index type");
+
+  auto reduceAttrs = getReduceAttrsAttr();
+  if (getNumReduceOperands() != (reduceAttrs ? reduceAttrs.size() : 0))
+    return emitOpError(
+        "mismatch in number of reduction variables and reduction attributes");
+
+  return mlir::success();
+}
+
 #include "flang/Optimizer/HLFIR/HLFIROpInterfaces.cpp.inc"
 #define GET_OP_CLASSES
 #include "flang/Optimizer/HLFIR/HLFIREnums.cpp.inc"
diff --git a/flang/test/HLFIR/do_concurrent.fir b/flang/test/HLFIR/do_concurrent.fir
new file mode 100644
index 0000000000000..aef9db2236a57
--- /dev/null
+++ b/flang/test/HLFIR/do_concurrent.fir
@@ -0,0 +1,92 @@
+// Test hlfir.do_concurrent operation parse, verify (no errors), and unparse
+
+// RUN: fir-opt %s | fir-opt | FileCheck %s
+
+func.func @dc_1d(%i_lb: index, %i_ub: index, %i_st: index) {
+  hlfir.do_concurrent {
+    %i = fir.alloca i32
+    hlfir.do_concurrent.loop (%i_iv) = (%i_lb) to (%i_ub) step (%i_st) {
+      %0 = fir.convert %i_iv : (index) -> i32
+      fir.store %0 to %i : !fir.ref<i32>
+    }
+  }
+  return
+}
+
+// CHECK-LABEL: func.func @dc_1d
+// CHECK-SAME:    (%[[I_LB:.*]]: index, %[[I_UB:.*]]: index, %[[I_ST:.*]]: index)
+// CHECK:         hlfir.do_concurrent {
+// CHECK:           %[[I:.*]] = fir.alloca i32
+// CHECK:           hlfir.do_concurrent.loop (%[[I_IV:.*]]) = (%[[I_LB]]) to (%[[I_UB]]) step (%[[I_ST]]) {
+// CHECK:             %[[I_IV_CVT:.*]] = fir.convert %[[I_IV]] : (index) -> i32
+// CHECK:             fir.store %[[I_IV_CVT]] to %[[I]] : !fir.ref<i32>
+// CHECK:           }
+// CHECK:         }
+
+func.func @dc_2d(%i_lb: index, %i_ub: index, %i_st: index,
+                 %j_lb: index, %j_ub: index, %j_st: index) {
+  hlfir.do_concurrent {
+    %i = fir.alloca i32
+    %j = fir.alloca i32
+    hlfir.do_concurrent.loop
+      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
+      %0 = fir.convert %i_iv : (index) -> i32
+      fir.store %0 to %i : !fir.ref<i32>
+
+      %1 = fir.convert %j_iv : (index) -> i32
+      fir.store %1 to %j : !fir.ref<i32>
+    }
+  }
+  return
+}
+
+// CHECK-LABEL: func.func @dc_2d
+// CHECK-SAME:    (%[[I_LB:.*]]: index, %[[I_UB:.*]]: index, %[[I_ST:.*]]: index, %[[J_LB:.*]]: index, %[[J_UB:.*]]: index, %[[J_ST:.*]]: index)
+// CHECK:         hlfir.do_concurrent {
+// CHECK:           %[[I:.*]] = fir.alloca i32
+// CHECK:           %[[J:.*]] = fir.alloca i32
+// CHECK:           hlfir.do_concurrent.loop
+// CHECK-SAME:        (%[[I_IV:.*]], %[[J_IV:.*]]) = (%[[I_LB]], %[[J_LB]]) to (%[[I_UB]], %[[J_UB]]) step (%[[I_ST]], %[[J_ST]]) {
+// CHECK:             %[[I_IV_CVT:.*]] = fir.convert %[[I_IV]] : (index) -> i32
+// CHECK:             fir.store %[[I_IV_CVT]] to %[[I]] : !fir.ref<i32>
+// CHECK:             %[[J_IV_CVT:.*]] = fir.convert %[[J_IV]] : (index) -> i32
+// CHECK:             fir.store %[[J_IV_CVT]] to %[[J]] : !fir.ref<i32>
+// CHECK:           }
+// CHECK:         }
+
+func.func @dc_2d_reduction(%i_lb: index, %i_ub: index, %i_st: index,
+                 %j_lb: index, %j_ub: index, %j_st: index) {
+  %sum = fir.alloca i32
+
+  hlfir.do_concurrent {
+    %i = fir.alloca i32
+    %j = fir.alloca i32
+    hlfir.do_concurrent.loop
+      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) 
+      reduce(#fir.reduce_attr<add> -> %sum : !fir.ref<i32>) {
+      %0 = fir.convert %i_iv : (index) -> i32
+      fir.store %0 to %i : !fir.ref<i32>
+
+      %1 = fir.convert %j_iv : (index) -> i32
+      fir.store %1 to %j : !fir.ref<i32>
+    }
+  }
+  return
+}
+
+// CHECK-LABEL: func.func @dc_2d_reduction
+// CHECK-SAME:    (%[[I_LB:.*]]: index, %[[I_UB:.*]]: index, %[[I_ST:.*]]: index, %[[J_LB:.*]]: index, %[[J_UB:.*]]: index, %[[J_ST:.*]]: index)
+
+// CHECK:         %[[SUM:.*]] = fir.alloca i32
+
+// CHECK:         hlfir.do_concurrent {
+// CHECK:           %[[I:.*]] = fir.alloca i32
+// CHECK:           %[[J:.*]] = fir.alloca i32
+// CHECK:           hlfir.do_concurrent.loop
+// CHECK-SAME:        (%[[I_IV:.*]], %[[J_IV:.*]]) = (%[[I_LB]], %[[J_LB]]) to (%[[I_UB]], %[[J_UB]]) step (%[[I_ST]], %[[J_ST]]) reduce(#fir.reduce_attr<add> -> %[[SUM]] : !fir.ref<i32>) {
+// CHECK:             %[[I_IV_CVT:.*]] = fir.convert %[[I_IV]] : (index) -> i32
+// CHECK:             fir.store %[[I_IV_CVT]] to %[[I]] : !fir.ref<i32>
+// CHECK:             %[[J_IV_CVT:.*]] = fir.convert %[[J_IV]] : (index) -> i32
+// CHECK:             fir.store %[[J_IV_CVT]] to %[[J]] : !fir.ref<i32>
+// CHECK:           }
+// CHECK:         }
diff --git a/flang/test/HLFIR/invalid.fir b/flang/test/HLFIR/invalid.fir
index d61efe0062e69..e14284f916bd9 100644
--- a/flang/test/HLFIR/invalid.fir
+++ b/flang/test/HLFIR/invalid.fir
@@ -1555,3 +1555,98 @@ func.func @bad_reshape(%arg0: !hlfir.expr<1x!fir.char<1,2>>, %arg1: !hlfir.expr<
   %0 = hlfir.reshape %arg0 %arg1 pad %arg2 : (!hlfir.expr<1x!fir.char<1,2>>, !hlfir.expr<1xi32>, !hlfir.expr<1x!fir.char<2,?>>) -> !hlfir.expr<?x!fir.char<1,?>>
   return
 }
+
+// -----
+
+func.func @empty_dc_wrapper_body() {
+  // expected-error@+1 {{'hlfir.do_concurrent' op expects a non-empty block}}
+  hlfir.do_concurrent {
+  }
+  return
+}
+
+// -----
+
+func.func @dc_wrong_terminator() {
+  // expected-error@+1 {{'hlfir.do_concurrent' op must be terminated by 'hlfir.do_concurrent.loop'}}
+  hlfir.do_concurrent {
+    llvm.return
+  }
+  return
+}
+
+// -----
+
+func.func @dc_0d() {
+   // expected-error@+2 {{'hlfir.do_concurrent.loop' op needs at least one tuple element for lowerBound, upperBound and step}}
+  hlfir.do_concurrent {
+    hlfir.do_concurrent.loop () = () to () step () {
+      %tmp = fir.alloca i32
+    }
+  }
+  return
+}
+
+// -----
+
+func.func @dc_invalid_parent(%arg0: index, %arg1: index) {
+  // expected-error@+1 {{'hlfir.do_concurrent.loop' op expects parent op 'hlfir.do_concurrent'}}
+  "hlfir.do_concurrent.loop"(%arg0, %arg1) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>}> ({
+  ^bb0(%arg2: index):
+     %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+  }) : (index, index) -> ()
+  return
+}
+
+// -----
+
+func.func @dc_invalid_control(%arg0: index, %arg1: index) {
+  // expected-error@+2 {{'hlfir.do_concurrent.loop' op different number of tuple elements for lowerBound, upperBound or step}}
+  hlfir.do_concurrent {
+    "hlfir.do_concurrent.loop"(%arg0, %arg1) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>}> ({
+    ^bb0(%arg2: index):
+      %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+    }) : (index, index) -> ()
+  }
+  return
+}
+
+// -----
+
+func.func @dc_invalid_ind_var(%arg0: index, %arg1: index) {
+  // expected-error@+2 {{'hlfir.do_concurrent.loop' op expects the same number of induction variables: 2 as bound and step values: 1}}
+  hlfir.do_concurrent {
+    "hlfir.do_concurrent.loop"(%arg0, %arg1, %arg0) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({
+    ^bb0(%arg3: index, %arg4: index):
+      %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+    }) : (index, index, index) -> ()
+  }
+  return
+}
+
+// -----
+
+func.func @dc_invalid_ind_var_type(%arg0: index, %arg1: index) {
+  // expected-error@+2 {{'hlfir.do_concurrent.loop' op expects arguments for the induction variable to be of index type}}
+  hlfir.do_concurrent {
+    "hlfir.do_concurrent.loop"(%arg0, %arg1, %arg0) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({
+    ^bb0(%arg3: i32):
+      %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+    }) : (index, index, index) -> ()
+  }
+  return
+}
+
+// -----
+
+func.func @dc_invalid_reduction(%arg0: index, %arg1: index) {
+  %sum = fir.alloca i32
+  // expected-error@+2 {{'hlfir.do_concurrent.loop' op mismatch in number of reduction variables and reduction attributes}}
+  hlfir.do_concurrent {
+    "hlfir.do_concurrent.loop"(%arg0, %arg1, %arg0, %sum) <{operandSegmentSizes = array<i32: 1, 1, 1, 1>}> ({
+    ^bb0(%arg3: index):
+      %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+    }) : (index, index, index, !fir.ref<i32>) -> ()
+  }
+  return
+}

@ergawy ergawy requested a review from clementval March 12, 2025 04:56
Copy link

github-actions bot commented Mar 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@ergawy ergawy force-pushed the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch from 0490357 to dd56911 Compare March 12, 2025 05:04
@ergawy ergawy requested a review from tblah March 12, 2025 05:29
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Please wait for somebody else to have a look before merging.

@ergawy ergawy force-pushed the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch from dd56911 to dcc9686 Compare March 12, 2025 10:49
Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The approach looks good to me. A few questions:

  • Why HFLIR and not FIR?

Putting it in HLFIR forces lowering it when moving from HLFIR to FIR, while it seems there is no deep requirement for this operation to be in HLFIR. The main reason for things to be in HLFIR is when they can manipulate hlfir.expr, which makes the so called "bufferization" the natural transition between FIR and HLFIR, and I am not sure it is best to correlate the hlfir.do_concurrent to xxx with bufferization (although I understand why you would want to run hlfir.do_concurrent to xxx early in the pipeline for some xxx, and that is OK, but I think it is best to give freedom to maintain fir.do_concurrent up to when we lower fir.do_loop).

  • What about locality specifiers?

I think you mentioned in the RFC that you intend to add them later to these ops, I just want to check that I got that part right.

@@ -1863,5 +1864,108 @@ def hlfir_EvaluateInMemoryOp : hlfir_Op<"eval_in_mem", [AttrSizedOperandSegments
let hasVerifier = 1;
}

def hlfir_DoConcurrentOp : hlfir_Op<"do_concurrent", [SingleBlock]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should have the AutomaticAllocationScope interface to "pin" the loop indices allocas to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, done!

@ergawy
Copy link
Member Author

ergawy commented Mar 13, 2025

* Why HFLIR and not FIR?

Putting it in HLFIR forces lowering it when moving from HLFIR to FIR, while it seems there is no deep requirement for this operation to be in HLFIR. The main reason for things to be in HLFIR is when they can manipulate hlfir.expr, which makes the so called "bufferization" the natural transition between FIR and HLFIR, and I am not sure it is best to correlate the hlfir.do_concurrent to xxx with bufferization (although I understand why you would want to run hlfir.do_concurrent to xxx early in the pipeline for some xxx, and that is OK, but I think it is best to give freedom to maintain fir.do_concurrent up to when we lower fir.do_loop).

Thanks for that context, very useful for me. This was suggested by @kiranchandramohan but it also made sense to me; since, in my mind, hlfir.do_concurrent is a high-level model of do concurrent loops that can be lowered to different targets including the lower level sibling fir.do_loop. Was HLFIR designed with the hlfir.expr and bufferization points being specifically in mind? Or is this just how the dialect has been used so far? If it is the latter, I think it won't be that bad to stretch the purpose/definition of the dialect to: "ops that are closer to Fortran source than the ones in FIR dialect". I am not too attached to that decision though, I think you guys will have better informed opinions here.

* What about locality specifiers?

I think you mentioned in the RFC that you intend to add them later to these ops, I just want to check that I got that part right.

Yes, I want to reuse what we have in OpenMP regarding locality specifiers for the Fortran dialects as well. See: #128148. However, I am postponing this until I implement the do_concurrent op first. #128148 is just PoC, the final aim is to extract the OpenMP table-gen records used in the PoC into a "data environment dialect" used for OpenMP, Fortran, and maybe OpenACC as well. Since I did the PoC with fir.do_loop in mind, I will need to rethink it after finishing up do_concurent.

@ergawy ergawy force-pushed the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch from dcc9686 to 2072558 Compare March 13, 2025 05:01
@kiranchandramohan
Copy link
Contributor

This was suggested by @kiranchandramohan

It was a suggestion from my side. Please follow @jeanPerier's direction if he has a strong opinion.

@ergawy
Copy link
Member Author

ergawy commented Mar 14, 2025

This was suggested by @kiranchandramohan

It was a suggestion from my side. Please follow @jeanPerier's direction if he has a strong opinion.

👍 @jeanPerier will move to fir then since that seems to be consistent of the current dialects definitions. Let me know if you had other thoughts.

@jeanPerier
Copy link
Contributor

This was suggested by @kiranchandramohan

It was a suggestion from my side. Please follow @jeanPerier's direction if he has a strong opinion.

👍 @jeanPerier will move to fir then since that seems to be consistent of the current dialects definitions. Let me know if you had other thoughts.

Thanks, that just seems the most flexible approach to me.
I do understand Kiran's advice based on the fact that HLFIR is meant to be closer from Fortran representation than FIR, and maybe I should clarify more the HLFIR vs FIR somewhere. I really think the hlfir.expr type is the corner stone of HLFIR, and if that type cannot be an operand or a result of that operation, it likely better belong to FIR. HLFIR can be seen as "tensor operations for FIR".

ergawy added 4 commits March 17, 2025 00:44
Adds new MLIR ops to model `do concurrent`. In order to make `do
concurrent` representation self-contained, a loop is modeled using 2
ops, one wrapper and one that contains the actual body of the loop. For
example, a 2D `do concurrent` loop is modeled as follows:

```mlir
  hlfir.do_concurrent {
    %i = fir.alloca i32
    %j = fir.alloca i32
    hlfir.do_concurrent.loop
      (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
      %0 = fir.convert %i_iv : (index) -> i32
      fir.store %0 to %i : !fir.ref<i32>

      %1 = fir.convert %j_iv : (index) -> i32
      fir.store %1 to %j : !fir.ref<i32>
    }
  }
```

The `hlfir.do_concurrent` wrapper op encapsulates both the actual loop
and the allocations required for the iteration variables. The
`hlfir.do_concurrent.loop` op is a multi-dimensional op that contains
the loop control and body. See the ops' docs for more info.
@ergawy ergawy force-pushed the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch from 2072558 to 1dc58b5 Compare March 17, 2025 06:16
@ergawy
Copy link
Member Author

ergawy commented Mar 17, 2025

Thanks all for the comments! Moved the ops to fir. Please take a look when you have time.

@ergawy ergawy changed the title [flang][hlfir] Add MLIR op for do concurrent [flang][fir] Add MLIR op for do concurrent Mar 17, 2025
Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@ergawy ergawy merged commit 1094ffc into main Mar 18, 2025
12 checks passed
@ergawy ergawy deleted the users/ergawy/hlfir_do_concurrent_op_1_mlir_op branch March 18, 2025 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants