-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[flang][fir] Add MLIR op for do concurrent
#130893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesAdds new MLIR ops to model hlfir.do_concurrent {
%i = fir.alloca i32
%j = fir.alloca i32
hlfir.do_concurrent.loop
(%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
%0 = fir.convert %i_iv : (index) -> i32
fir.store %0 to %i : !fir.ref<i32>
%1 = fir.convert %j_iv : (index) -> i32
fir.store %1 to %j : !fir.ref<i32>
}
} The Full diff: https://github.com/llvm/llvm-project/pull/130893.diff 4 Files Affected:
diff --git a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
index f69930d5b53b3..089c67af5d313 100644
--- a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
+++ b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
@@ -21,6 +21,7 @@ include "flang/Optimizer/Dialect/FIRAttr.td"
include "flang/Optimizer/Dialect/FortranVariableInterface.td"
include "mlir/Dialect/Arith/IR/ArithBase.td"
include "mlir/Dialect/Arith/IR/ArithOpsInterfaces.td"
+include "mlir/Dialect/LLVMIR/LLVMAttrDefs.td"
include "mlir/IR/BuiltinAttributes.td"
// Base class for FIR operations.
@@ -1863,5 +1864,120 @@ def hlfir_EvaluateInMemoryOp : hlfir_Op<"eval_in_mem", [AttrSizedOperandSegments
let hasVerifier = 1;
}
+def hlfir_DoConcurrentOp : hlfir_Op<"do_concurrent", [SingleBlock]> {
+ let summary = "do concurrent loop wrapper";
+
+ let description = [{
+ A wrapper operation for the actual op modeling `do concurrent` loops:
+ `hlfir.do_concurrent.loop` (see op declaration below for more info about it).
+
+ The `hlfir.do_concurrent` wrapper op consists of one single-block region with
+ the following properties:
+ - The first ops in the region are responsible for allocating storage for the
+ loop's iteration variables. This is property is **not** enforced by the op
+ verifier, but expected to be respected when building the op.
+ - The terminator of the region is an instance of `hlfir.do_concurrent.loop`.
+
+ For example, a 2D loop nest would be represented as follows:
+ ```
+ hlfir.do_concurrent {
+ %i = fir.alloca i32
+ %j = fir.alloca i32
+ hlfir.do_concurrent.loop ...
+ }
+ ```
+ }];
+
+ let regions = (region SizedRegion<1>:$region);
+
+ let assemblyFormat = "$region attr-dict";
+ let hasVerifier = 1;
+}
+
+def hlfir_DoConcurrentLoopOp : hlfir_Op<"do_concurrent.loop",
+ [AttrSizedOperandSegments, DeclareOpInterfaceMethods<LoopLikeOpInterface>,
+ Terminator, NoTerminator, SingleBlock, ParentOneOf<["DoConcurrentOp"]>]> {
+ let summary = "do concurrent loop";
+
+ let description = [{
+ An operation that models a Fortran `do concurrent` loop's header and block.
+ This is a single-region single-block terminator op that is expected to
+ terminate the region of a `omp.do_concurrent` wrapper op.
+
+ This op borrows from both `scf.parallel` and `fir.do_loop` ops. Similar to
+ `scf.parallel`, a loop nest takes 3 groups of SSA values as operands that
+ represent the lower bounds, upper bounds, and steps. Similar to `fir.do_loop`
+ the op takes one additional group of SSA values to represent reductions.
+
+ The body region **does not** have a terminator.
+
+ For example, a 2D loop nest with 2 reductions (sum and max) would be
+ represented as follows:
+ ```
+ // The wrapper of the loop
+ hlfir.do_concurrent {
+ %i = fir.alloca i32
+ %j = fir.alloca i32
+
+ // The actual `do concurrent` loop
+ hlfir.do_concurrent.loop
+ (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st)
+ reduce(#fir.reduce_attr<add> -> %sum : !fir.ref<i32>,
+ #fir.reduce_attr<max> -> %max : !fir.ref<f32>) {
+
+ %0 = fir.convert %i_iv : (index) -> i32
+ fir.store %0 to %i : !fir.ref<i32>
+
+ %1 = fir.convert %j_iv : (index) -> i32
+ fir.store %1 to %j : !fir.ref<i32>
+
+ // ... loop body goes here ...
+ }
+ }
+ ```
+
+ Description of arguments:
+ - `lowerBound`: The group of SSA values for the nest's lower bounds.
+ - `upperBound`: The group of SSA values for the nest's upper bounds.
+ - `step`: The group of SSA values for the nest's steps.
+ - `reduceOperands`: The reduction SSA values, if any.
+ - `reduceAttrs`: Attributes to store reduction operations, if any.
+ - `loopAnnotation`: Loop metadata to be passed down the compiler pipeline to
+ LLVM.
+ }];
+
+ let arguments = (ins
+ Variadic<Index>:$lowerBound,
+ Variadic<Index>:$upperBound,
+ Variadic<Index>:$step,
+ Variadic<AnyType>:$reduceOperands,
+ OptionalAttr<ArrayAttr>:$reduceAttrs,
+ OptionalAttr<LoopAnnotationAttr>:$loopAnnotation
+ );
+
+ let regions = (region SizedRegion<1>:$region);
+
+ let hasCustomAssemblyFormat = 1;
+ let hasVerifier = 1;
+
+ let extraClassDeclaration = [{
+ /// Get Number of variadic operands
+ unsigned getNumOperands(unsigned segmentIdx) {
+ auto segments = (*this)->getAttrOfType<mlir::DenseI32ArrayAttr>(
+ getOperandSegmentSizeAttr());
+ return static_cast<unsigned>(segments[segmentIdx]);
+ }
+
+ // Get Number of reduction operands
+ unsigned getNumReduceOperands() {
+ return getNumOperands(3);
+ }
+
+ /// Does the operation hold operands for reduction variables
+ bool hasReduceOperands() {
+ return getNumReduceOperands() > 0;
+ }
+ }];
+}
#endif // FORTRAN_DIALECT_HLFIR_OPS
diff --git a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
index 8851a3a7187b9..576dc601aff14 100644
--- a/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
+++ b/flang/lib/Optimizer/HLFIR/IR/HLFIROps.cpp
@@ -12,6 +12,7 @@
#include "flang/Optimizer/HLFIR/HLFIROps.h"
+#include "flang/Optimizer/Dialect/FIRAttr.h"
#include "flang/Optimizer/Dialect/FIROpsSupport.h"
#include "flang/Optimizer/Dialect/FIRType.h"
#include "flang/Optimizer/Dialect/Support/FIRContext.h"
@@ -2246,6 +2247,168 @@ llvm::LogicalResult hlfir::EvaluateInMemoryOp::verify() {
return mlir::success();
}
+//===----------------------------------------------------------------------===//
+// DoConcurrentOp
+//===----------------------------------------------------------------------===//
+
+llvm::LogicalResult hlfir::DoConcurrentOp::verify() {
+ mlir::Block *body = getBody();
+
+ if (body->empty())
+ return emitOpError("body cannot be empty");
+
+ if (!body->mightHaveTerminator() ||
+ !mlir::isa<hlfir::DoConcurrentLoopOp>(body->getTerminator()))
+ return emitOpError("must be terminated by 'hlfir.do_concurrent.loop'");
+
+ return mlir::success();
+}
+
+//===----------------------------------------------------------------------===//
+// DoConcurrentLoopOp
+//===----------------------------------------------------------------------===//
+
+mlir::ParseResult
+hlfir::DoConcurrentLoopOp::parse(mlir::OpAsmParser &parser,
+ mlir::OperationState &result) {
+ auto &builder = parser.getBuilder();
+ // Parse an opening `(` followed by induction variables followed by `)`
+ llvm::SmallVector<mlir::OpAsmParser::Argument, 4> ivs;
+ if (parser.parseArgumentList(ivs, mlir::OpAsmParser::Delimiter::Paren))
+ return mlir::failure();
+
+ // Parse loop bounds.
+ llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand, 4> lower;
+ if (parser.parseEqual() ||
+ parser.parseOperandList(lower, ivs.size(),
+ mlir::OpAsmParser::Delimiter::Paren) ||
+ parser.resolveOperands(lower, builder.getIndexType(), result.operands))
+ return mlir::failure();
+
+ llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand, 4> upper;
+ if (parser.parseKeyword("to") ||
+ parser.parseOperandList(upper, ivs.size(),
+ mlir::OpAsmParser::Delimiter::Paren) ||
+ parser.resolveOperands(upper, builder.getIndexType(), result.operands))
+ return mlir::failure();
+
+ // Parse step values.
+ llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand, 4> steps;
+ if (parser.parseKeyword("step") ||
+ parser.parseOperandList(steps, ivs.size(),
+ mlir::OpAsmParser::Delimiter::Paren) ||
+ parser.resolveOperands(steps, builder.getIndexType(), result.operands))
+ return mlir::failure();
+
+ llvm::SmallVector<mlir::OpAsmParser::UnresolvedOperand> reduceOperands;
+ llvm::SmallVector<mlir::Type> reduceArgTypes;
+ if (succeeded(parser.parseOptionalKeyword("reduce"))) {
+ // Parse reduction attributes and variables.
+ llvm::SmallVector<fir::ReduceAttr> attributes;
+ if (failed(parser.parseCommaSeparatedList(
+ mlir::AsmParser::Delimiter::Paren, [&]() {
+ if (parser.parseAttribute(attributes.emplace_back()) ||
+ parser.parseArrow() ||
+ parser.parseOperand(reduceOperands.emplace_back()) ||
+ parser.parseColonType(reduceArgTypes.emplace_back()))
+ return mlir::failure();
+ return mlir::success();
+ })))
+ return mlir::failure();
+ // Resolve input operands.
+ for (auto operand_type : llvm::zip(reduceOperands, reduceArgTypes))
+ if (parser.resolveOperand(std::get<0>(operand_type),
+ std::get<1>(operand_type), result.operands))
+ return mlir::failure();
+ llvm::SmallVector<mlir::Attribute> arrayAttr(attributes.begin(),
+ attributes.end());
+ result.addAttribute(getReduceAttrsAttrName(result.name),
+ builder.getArrayAttr(arrayAttr));
+ }
+
+ // Now parse the body.
+ mlir::Region *body = result.addRegion();
+ for (auto &iv : ivs)
+ iv.type = builder.getIndexType();
+ if (parser.parseRegion(*body, ivs))
+ return mlir::failure();
+
+ // Set `operandSegmentSizes` attribute.
+ result.addAttribute(DoConcurrentLoopOp::getOperandSegmentSizeAttr(),
+ builder.getDenseI32ArrayAttr(
+ {static_cast<int32_t>(lower.size()),
+ static_cast<int32_t>(upper.size()),
+ static_cast<int32_t>(steps.size()),
+ static_cast<int32_t>(reduceOperands.size())}));
+
+ // Parse attributes.
+ if (parser.parseOptionalAttrDict(result.attributes))
+ return mlir::failure();
+
+ return mlir::success();
+}
+
+void hlfir::DoConcurrentLoopOp::print(mlir::OpAsmPrinter &p) {
+ p << " (" << getBody()->getArguments() << ") = (" << getLowerBound()
+ << ") to (" << getUpperBound() << ") step (" << getStep() << ")";
+
+ if (hasReduceOperands()) {
+ p << " reduce(";
+ auto attrs = getReduceAttrsAttr();
+ auto operands = getReduceOperands();
+ llvm::interleaveComma(llvm::zip(attrs, operands), p, [&](auto it) {
+ p << std::get<0>(it) << " -> " << std::get<1>(it) << " : "
+ << std::get<1>(it).getType();
+ });
+ p << ')';
+ }
+
+ p << ' ';
+ p.printRegion(getRegion(), /*printEntryBlockArgs=*/false);
+ p.printOptionalAttrDict(
+ (*this)->getAttrs(),
+ /*elidedAttrs=*/{DoConcurrentLoopOp::getOperandSegmentSizeAttr(),
+ DoConcurrentLoopOp::getReduceAttrsAttrName()});
+}
+
+llvm::SmallVector<mlir::Region *> hlfir::DoConcurrentLoopOp::getLoopRegions() {
+ return {&getRegion()};
+}
+
+llvm::LogicalResult hlfir::DoConcurrentLoopOp::verify() {
+ mlir::Operation::operand_range lbValues = getLowerBound();
+ mlir::Operation::operand_range ubValues = getUpperBound();
+ mlir::Operation::operand_range stepValues = getStep();
+
+ if (lbValues.empty())
+ return emitOpError(
+ "needs at least one tuple element for lowerBound, upperBound and step");
+
+ if (lbValues.size() != ubValues.size() ||
+ ubValues.size() != stepValues.size())
+ return emitOpError(
+ "different number of tuple elements for lowerBound, upperBound or step");
+
+ // Check that the body defines the same number of block arguments as the
+ // number of tuple elements in step.
+ mlir::Block *body = getBody();
+ if (body->getNumArguments() != stepValues.size())
+ return emitOpError() << "expects the same number of induction variables: "
+ << body->getNumArguments()
+ << " as bound and step values: " << stepValues.size();
+ for (auto arg : body->getArguments())
+ if (!arg.getType().isIndex())
+ return emitOpError(
+ "expects arguments for the induction variable to be of index type");
+
+ auto reduceAttrs = getReduceAttrsAttr();
+ if (getNumReduceOperands() != (reduceAttrs ? reduceAttrs.size() : 0))
+ return emitOpError(
+ "mismatch in number of reduction variables and reduction attributes");
+
+ return mlir::success();
+}
+
#include "flang/Optimizer/HLFIR/HLFIROpInterfaces.cpp.inc"
#define GET_OP_CLASSES
#include "flang/Optimizer/HLFIR/HLFIREnums.cpp.inc"
diff --git a/flang/test/HLFIR/do_concurrent.fir b/flang/test/HLFIR/do_concurrent.fir
new file mode 100644
index 0000000000000..aef9db2236a57
--- /dev/null
+++ b/flang/test/HLFIR/do_concurrent.fir
@@ -0,0 +1,92 @@
+// Test hlfir.do_concurrent operation parse, verify (no errors), and unparse
+
+// RUN: fir-opt %s | fir-opt | FileCheck %s
+
+func.func @dc_1d(%i_lb: index, %i_ub: index, %i_st: index) {
+ hlfir.do_concurrent {
+ %i = fir.alloca i32
+ hlfir.do_concurrent.loop (%i_iv) = (%i_lb) to (%i_ub) step (%i_st) {
+ %0 = fir.convert %i_iv : (index) -> i32
+ fir.store %0 to %i : !fir.ref<i32>
+ }
+ }
+ return
+}
+
+// CHECK-LABEL: func.func @dc_1d
+// CHECK-SAME: (%[[I_LB:.*]]: index, %[[I_UB:.*]]: index, %[[I_ST:.*]]: index)
+// CHECK: hlfir.do_concurrent {
+// CHECK: %[[I:.*]] = fir.alloca i32
+// CHECK: hlfir.do_concurrent.loop (%[[I_IV:.*]]) = (%[[I_LB]]) to (%[[I_UB]]) step (%[[I_ST]]) {
+// CHECK: %[[I_IV_CVT:.*]] = fir.convert %[[I_IV]] : (index) -> i32
+// CHECK: fir.store %[[I_IV_CVT]] to %[[I]] : !fir.ref<i32>
+// CHECK: }
+// CHECK: }
+
+func.func @dc_2d(%i_lb: index, %i_ub: index, %i_st: index,
+ %j_lb: index, %j_ub: index, %j_st: index) {
+ hlfir.do_concurrent {
+ %i = fir.alloca i32
+ %j = fir.alloca i32
+ hlfir.do_concurrent.loop
+ (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) {
+ %0 = fir.convert %i_iv : (index) -> i32
+ fir.store %0 to %i : !fir.ref<i32>
+
+ %1 = fir.convert %j_iv : (index) -> i32
+ fir.store %1 to %j : !fir.ref<i32>
+ }
+ }
+ return
+}
+
+// CHECK-LABEL: func.func @dc_2d
+// CHECK-SAME: (%[[I_LB:.*]]: index, %[[I_UB:.*]]: index, %[[I_ST:.*]]: index, %[[J_LB:.*]]: index, %[[J_UB:.*]]: index, %[[J_ST:.*]]: index)
+// CHECK: hlfir.do_concurrent {
+// CHECK: %[[I:.*]] = fir.alloca i32
+// CHECK: %[[J:.*]] = fir.alloca i32
+// CHECK: hlfir.do_concurrent.loop
+// CHECK-SAME: (%[[I_IV:.*]], %[[J_IV:.*]]) = (%[[I_LB]], %[[J_LB]]) to (%[[I_UB]], %[[J_UB]]) step (%[[I_ST]], %[[J_ST]]) {
+// CHECK: %[[I_IV_CVT:.*]] = fir.convert %[[I_IV]] : (index) -> i32
+// CHECK: fir.store %[[I_IV_CVT]] to %[[I]] : !fir.ref<i32>
+// CHECK: %[[J_IV_CVT:.*]] = fir.convert %[[J_IV]] : (index) -> i32
+// CHECK: fir.store %[[J_IV_CVT]] to %[[J]] : !fir.ref<i32>
+// CHECK: }
+// CHECK: }
+
+func.func @dc_2d_reduction(%i_lb: index, %i_ub: index, %i_st: index,
+ %j_lb: index, %j_ub: index, %j_st: index) {
+ %sum = fir.alloca i32
+
+ hlfir.do_concurrent {
+ %i = fir.alloca i32
+ %j = fir.alloca i32
+ hlfir.do_concurrent.loop
+ (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st)
+ reduce(#fir.reduce_attr<add> -> %sum : !fir.ref<i32>) {
+ %0 = fir.convert %i_iv : (index) -> i32
+ fir.store %0 to %i : !fir.ref<i32>
+
+ %1 = fir.convert %j_iv : (index) -> i32
+ fir.store %1 to %j : !fir.ref<i32>
+ }
+ }
+ return
+}
+
+// CHECK-LABEL: func.func @dc_2d_reduction
+// CHECK-SAME: (%[[I_LB:.*]]: index, %[[I_UB:.*]]: index, %[[I_ST:.*]]: index, %[[J_LB:.*]]: index, %[[J_UB:.*]]: index, %[[J_ST:.*]]: index)
+
+// CHECK: %[[SUM:.*]] = fir.alloca i32
+
+// CHECK: hlfir.do_concurrent {
+// CHECK: %[[I:.*]] = fir.alloca i32
+// CHECK: %[[J:.*]] = fir.alloca i32
+// CHECK: hlfir.do_concurrent.loop
+// CHECK-SAME: (%[[I_IV:.*]], %[[J_IV:.*]]) = (%[[I_LB]], %[[J_LB]]) to (%[[I_UB]], %[[J_UB]]) step (%[[I_ST]], %[[J_ST]]) reduce(#fir.reduce_attr<add> -> %[[SUM]] : !fir.ref<i32>) {
+// CHECK: %[[I_IV_CVT:.*]] = fir.convert %[[I_IV]] : (index) -> i32
+// CHECK: fir.store %[[I_IV_CVT]] to %[[I]] : !fir.ref<i32>
+// CHECK: %[[J_IV_CVT:.*]] = fir.convert %[[J_IV]] : (index) -> i32
+// CHECK: fir.store %[[J_IV_CVT]] to %[[J]] : !fir.ref<i32>
+// CHECK: }
+// CHECK: }
diff --git a/flang/test/HLFIR/invalid.fir b/flang/test/HLFIR/invalid.fir
index d61efe0062e69..e14284f916bd9 100644
--- a/flang/test/HLFIR/invalid.fir
+++ b/flang/test/HLFIR/invalid.fir
@@ -1555,3 +1555,98 @@ func.func @bad_reshape(%arg0: !hlfir.expr<1x!fir.char<1,2>>, %arg1: !hlfir.expr<
%0 = hlfir.reshape %arg0 %arg1 pad %arg2 : (!hlfir.expr<1x!fir.char<1,2>>, !hlfir.expr<1xi32>, !hlfir.expr<1x!fir.char<2,?>>) -> !hlfir.expr<?x!fir.char<1,?>>
return
}
+
+// -----
+
+func.func @empty_dc_wrapper_body() {
+ // expected-error@+1 {{'hlfir.do_concurrent' op expects a non-empty block}}
+ hlfir.do_concurrent {
+ }
+ return
+}
+
+// -----
+
+func.func @dc_wrong_terminator() {
+ // expected-error@+1 {{'hlfir.do_concurrent' op must be terminated by 'hlfir.do_concurrent.loop'}}
+ hlfir.do_concurrent {
+ llvm.return
+ }
+ return
+}
+
+// -----
+
+func.func @dc_0d() {
+ // expected-error@+2 {{'hlfir.do_concurrent.loop' op needs at least one tuple element for lowerBound, upperBound and step}}
+ hlfir.do_concurrent {
+ hlfir.do_concurrent.loop () = () to () step () {
+ %tmp = fir.alloca i32
+ }
+ }
+ return
+}
+
+// -----
+
+func.func @dc_invalid_parent(%arg0: index, %arg1: index) {
+ // expected-error@+1 {{'hlfir.do_concurrent.loop' op expects parent op 'hlfir.do_concurrent'}}
+ "hlfir.do_concurrent.loop"(%arg0, %arg1) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>}> ({
+ ^bb0(%arg2: index):
+ %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+ }) : (index, index) -> ()
+ return
+}
+
+// -----
+
+func.func @dc_invalid_control(%arg0: index, %arg1: index) {
+ // expected-error@+2 {{'hlfir.do_concurrent.loop' op different number of tuple elements for lowerBound, upperBound or step}}
+ hlfir.do_concurrent {
+ "hlfir.do_concurrent.loop"(%arg0, %arg1) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>}> ({
+ ^bb0(%arg2: index):
+ %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+ }) : (index, index) -> ()
+ }
+ return
+}
+
+// -----
+
+func.func @dc_invalid_ind_var(%arg0: index, %arg1: index) {
+ // expected-error@+2 {{'hlfir.do_concurrent.loop' op expects the same number of induction variables: 2 as bound and step values: 1}}
+ hlfir.do_concurrent {
+ "hlfir.do_concurrent.loop"(%arg0, %arg1, %arg0) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({
+ ^bb0(%arg3: index, %arg4: index):
+ %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+ }) : (index, index, index) -> ()
+ }
+ return
+}
+
+// -----
+
+func.func @dc_invalid_ind_var_type(%arg0: index, %arg1: index) {
+ // expected-error@+2 {{'hlfir.do_concurrent.loop' op expects arguments for the induction variable to be of index type}}
+ hlfir.do_concurrent {
+ "hlfir.do_concurrent.loop"(%arg0, %arg1, %arg0) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({
+ ^bb0(%arg3: i32):
+ %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+ }) : (index, index, index) -> ()
+ }
+ return
+}
+
+// -----
+
+func.func @dc_invalid_reduction(%arg0: index, %arg1: index) {
+ %sum = fir.alloca i32
+ // expected-error@+2 {{'hlfir.do_concurrent.loop' op mismatch in number of reduction variables and reduction attributes}}
+ hlfir.do_concurrent {
+ "hlfir.do_concurrent.loop"(%arg0, %arg1, %arg0, %sum) <{operandSegmentSizes = array<i32: 1, 1, 1, 1>}> ({
+ ^bb0(%arg3: index):
+ %tmp = "fir.alloca"() <{in_type = i32, operandSegmentSizes = array<i32: 0, 0>}> : () -> !fir.ref<i32>
+ }) : (index, index, index, !fir.ref<i32>) -> ()
+ }
+ return
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
0490357
to
dd56911
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Please wait for somebody else to have a look before merging.
dd56911
to
dcc9686
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The approach looks good to me. A few questions:
- Why HFLIR and not FIR?
Putting it in HLFIR forces lowering it when moving from HLFIR to FIR, while it seems there is no deep requirement for this operation to be in HLFIR. The main reason for things to be in HLFIR is when they can manipulate hlfir.expr, which makes the so called "bufferization" the natural transition between FIR and HLFIR, and I am not sure it is best to correlate the hlfir.do_concurrent to xxx with bufferization (although I understand why you would want to run hlfir.do_concurrent to xxx early in the pipeline for some xxx, and that is OK, but I think it is best to give freedom to maintain fir.do_concurrent up to when we lower fir.do_loop).
- What about locality specifiers?
I think you mentioned in the RFC that you intend to add them later to these ops, I just want to check that I got that part right.
@@ -1863,5 +1864,108 @@ def hlfir_EvaluateInMemoryOp : hlfir_Op<"eval_in_mem", [AttrSizedOperandSegments | |||
let hasVerifier = 1; | |||
} | |||
|
|||
def hlfir_DoConcurrentOp : hlfir_Op<"do_concurrent", [SingleBlock]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should have the AutomaticAllocationScope
interface to "pin" the loop indices allocas to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, done!
Thanks for that context, very useful for me. This was suggested by @kiranchandramohan but it also made sense to me; since, in my mind,
Yes, I want to reuse what we have in OpenMP regarding locality specifiers for the Fortran dialects as well. See: #128148. However, I am postponing this until I implement the |
dcc9686
to
2072558
Compare
It was a suggestion from my side. Please follow @jeanPerier's direction if he has a strong opinion. |
👍 @jeanPerier will move to fir then since that seems to be consistent of the current dialects definitions. Let me know if you had other thoughts. |
Thanks, that just seems the most flexible approach to me. |
Adds new MLIR ops to model `do concurrent`. In order to make `do concurrent` representation self-contained, a loop is modeled using 2 ops, one wrapper and one that contains the actual body of the loop. For example, a 2D `do concurrent` loop is modeled as follows: ```mlir hlfir.do_concurrent { %i = fir.alloca i32 %j = fir.alloca i32 hlfir.do_concurrent.loop (%i_iv, %j_iv) = (%i_lb, %j_lb) to (%i_ub, %j_ub) step (%i_st, %j_st) { %0 = fir.convert %i_iv : (index) -> i32 fir.store %0 to %i : !fir.ref<i32> %1 = fir.convert %j_iv : (index) -> i32 fir.store %1 to %j : !fir.ref<i32> } } ``` The `hlfir.do_concurrent` wrapper op encapsulates both the actual loop and the allocations required for the iteration variables. The `hlfir.do_concurrent.loop` op is a multi-dimensional op that contains the loop control and body. See the ops' docs for more info.
2072558
to
1dc58b5
Compare
Thanks all for the comments! Moved the ops to fir. Please take a look when you have time. |
do concurrent
do concurrent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
Adds new MLIR ops to model
do concurrent
. In order to makedo concurrent
representation self-contained, a loop is modeled using 2 ops, one wrapper and one that contains the actual body of the loop. For example, a 2Ddo concurrent
loop is modeled as follows:The
fir.do_concurrent
wrapper op encapsulates both the actual loop and the allocations required for the iteration variables. Thefir.do_concurrent.loop
op is a multi-dimensional op that contains the loop control and body. See the ops' docs for more info.