MLIR LLVM-IR dialect -- status and lowering questions

(I BCC'ed flang-dev as they might be interested in this as well.)

Last week at SC we were discussing what the fastest and safest way to
get a working (=stable) F18 compiler is. Designing FIR and two (or
three) lowerings (AST -(1)-> FIR -(2)-> (MLIR LLVM-IR -(3)->) LLVM-IR)
seems to be something we want to end up eventually but it might not be
the fastest solution.

The options I'm currently evaluating wrt. complexity to stable solution,
and which I will discuss in more detail on the flang-dev list, are:
  1) AST -> LLVM-IR
  2) AST -> MLIR LLVM-IR -> LLVM-IR
  3) AST -> FIR -> MLIR LLVM-IR -> LLVM-IR

My questions/statements here are concerning the MLIR LLVM-IR dialect and
the lowerings to/from it, as well as the interplay of MLIR dialects and
LLVM-IR. I formulated some parts as statements and I would appreciate it
if you could comment on my understanding. If I missed the appropriate
documentation page, please forgive me.

1) Can I lower different dialects into LLVM-IR at the same time or do I
   need to lower to MLIR LLVM-IR first? When I ask if "I can do that" I
   mean if it is a use case that should conceptually work and also if it
   is already done by someone, thus actually working right now.
2) I browsed the LLVM-IR MLIR dialect and it looks like the
   instructions, attributes, etc. are hard coded, correct? (I mean we
   need to add them one by one to match LLVM-IR and keep them in-sync).
3) As far as I can tell,
   a) various instructions are present already (in their basic form,
      e.g., no nsw/nuw, inbounds, ...) but there seems to be some
      missing (switch was one I didn't find immediatly). Is there a
      list?
   c) I also did only find a handful of attributes (noalias & nosideeffect).
   d) Global symbols seems to be very restricted right now e.g.,
      variables are internal only, functions external, right?

I might have more questions but this seems to be a good starting point.

Thanks in advance,
  Johannes

Hi Mehdi,

thanks for the quick reply!

With your general confirmation of my findings *and* the additional
details you provided I should be able to compose a pros/cons list for
the F18 people this week.

I'll probably move the conversation to the flang-dev list now, sign up
in case you are interested :). However, I still inlined some comments
below.

Cheers,
  Johannes

> (I BCC'ed flang-dev as they might be interested in this as well.)
>
> Last week at SC we were discussing what the fastest and safest way to
> get a working (=stable) F18 compiler is. Designing FIR and two (or
> three) lowerings (AST -(1)-> FIR -(2)-> (MLIR LLVM-IR -(3)->) LLVM-IR)
> seems to be something we want to end up eventually but it might not be
> the fastest solution.
>
> The options I'm currently evaluating wrt. complexity to stable solution,
> and which I will discuss in more detail on the flang-dev list, are:
> 1) AST -> LLVM-IR
> 2) AST -> MLIR LLVM-IR -> LLVM-IR
> 3) AST -> FIR -> MLIR LLVM-IR -> LLVM-IR
>

I suspect 2 makes sense over 1 only if you include more MLIR construct
(like the OpenMP dialect), otherwise I don't see the difference between 1
and 2?

It depends. I think 1 is "less" work but more "wasted" work.

> My questions/statements here are concerning the MLIR LLVM-IR dialect and
> the lowerings to/from it, as well as the interplay of MLIR dialects and
> LLVM-IR. I formulated some parts as statements and I would appreciate it
> if you could comment on my understanding. If I missed the appropriate
> documentation page, please forgive me.
>
> 1) Can I lower different dialects into LLVM-IR at the same time or do I
> need to lower to MLIR LLVM-IR first? When I ask if "I can do that" I
> mean if it is a use case that should conceptually work and also if it
> is already done by someone, thus actually working right now.
>

You can export to LLVM IR on your own from any dialect (ultimately Clang
does it while traversing its AST, so you can traverse your own IR the same
way).

Perfect. That is needed for the FIR + OpenMP dialect already since the
latter shall not be lowered to MLIR LLVM-IR first.

The point of the LLVM dialect is to make this easier and more re-usable /
composable though.

That makes sense.

> 2) I browsed the LLVM-IR MLIR dialect and it looks like the
> instructions, attributes, etc. are hard coded, correct? (I mean we
> need to add them one by one to match LLVM-IR and keep them in-sync).
>

Yes.

We have this longer term idea to generate the LLVM IR constructs (at least
the verifiers, etc.) from the same tablegen as the MLIR dialect (it can be
generating the MLIR dialect from a TableGen in LLVM for instance), that's
gonna take more discussions within LLVM though.

I think that is reasonable, probably takes a while to prepare though.

> 3) As far as I can tell,
> a) various instructions are present already (in their basic form,
> e.g., no nsw/nuw, inbounds, ...) but there seems to be some
> missing (switch was one I didn't find immediatly). Is there a
> list?
>

I don't think we have a list, we discussed this recently:
https://groups.google.com/a/tensorflow.org/d/msg/mlir/gUTcuFex7eA/Ebj38saiBQAJ
It'd be nice to compute the full list indeed.

Thanks for the reference. I read this thread a month ago but forgot
about it again.

> c) I also did only find a handful of attributes (noalias &
> nosideeffect).
>

I don't even think that "nosideeffect" maps to anything when exporting to
LLVM IR at the moment.

Right. So noalias is the blueprint it seems.

> d) Global symbols seems to be very restricted right now e.g.,
> variables are internal only, functions external, right?
>

Right, we haven't yet added Linkage and ThreadLocalMode attributes on
these. The dialect has been brought as the need came for lowering from
higher-level dialects.
The more interesting part is handling these in the general FuncOp.

What do you mean with the last sentence? Is there some bigger design
question here to add linkage types (etc.) to the MLIR LLVM-IR dialect?

Cheers,
  Johannes

>
> > (I BCC'ed flang-dev as they might be interested in this as well.)
> >
> > Last week at SC we were discussing what the fastest and safest way to
> > get a working (=stable) F18 compiler is. Designing FIR and two (or
> > three) lowerings (AST -(1)-> FIR -(2)-> (MLIR LLVM-IR -(3)->) LLVM-IR)
> > seems to be something we want to end up eventually but it might not be
> > the fastest solution.
> >
> > The options I'm currently evaluating wrt. complexity to stable solution,
> > and which I will discuss in more detail on the flang-dev list, are:
> > 1) AST -> LLVM-IR
> > 2) AST -> MLIR LLVM-IR -> LLVM-IR
> > 3) AST -> FIR -> MLIR LLVM-IR -> LLVM-IR
> >
>
> I suspect 2 makes sense over 1 only if you include more MLIR construct
> (like the OpenMP dialect), otherwise I don't see the difference between 1
> and 2?

It depends. I think 1 is "less" work but more "wasted" work.

1. Trying to bridge from the f18 parse trees straight to LLVM-IR will be a ton
of work and should be throw-away work at that.

2. has problems in that semantics from Fortran cannot be represented in generic
off-the-shelf MLIR. It would've been great, but alas. :slight_smile:

3. This is where the real action is presently. It would be great to have more
volunteers, who want to contribute to the project, jump in and assist here.
There is plenty that can be worked on. Please contact me, if interested.

> > My questions/statements here are concerning the MLIR LLVM-IR dialect and
> > the lowerings to/from it, as well as the interplay of MLIR dialects and
> > LLVM-IR. I formulated some parts as statements and I would appreciate it
> > if you could comment on my understanding. If I missed the appropriate
> > documentation page, please forgive me.
> >
> > 1) Can I lower different dialects into LLVM-IR at the same time or do I
> > need to lower to MLIR LLVM-IR first? When I ask if "I can do that" I
> > mean if it is a use case that should conceptually work and also if it
> > is already done by someone, thus actually working right now.
> >
>
> You can export to LLVM IR on your own from any dialect (ultimately Clang
> does it while traversing its AST, so you can traverse your own IR the same
> way).

Perfect. That is needed for the FIR + OpenMP dialect already since the
latter shall not be lowered to MLIR LLVM-IR first.

The Tilikum bridge lowers the bulk of FIR (modulo bugs and NYI features) and
the standard MLIR dialect both in one big dialect conversion. Much of Tilikum
is prototyped now.

But that's the beginning of the story, as there are features to be completed
in the MLIR LLVM-IR dialect as well (which is where your subsequent conversion
goes). Furthermore, features to support Fortran will need to find their way
into LLVM (proper) itself.

> >
> > > (I BCC'ed flang-dev as they might be interested in this as well.)
> > >
> > > Last week at SC we were discussing what the fastest and safest way to
> > > get a working (=stable) F18 compiler is. Designing FIR and two (or
> > > three) lowerings (AST -(1)-> FIR -(2)-> (MLIR LLVM-IR -(3)->) LLVM-IR)
> > > seems to be something we want to end up eventually but it might not be
> > > the fastest solution.
> > >
> > > The options I'm currently evaluating wrt. complexity to stable solution,
> > > and which I will discuss in more detail on the flang-dev list, are:
> > > 1) AST -> LLVM-IR
> > > 2) AST -> MLIR LLVM-IR -> LLVM-IR
> > > 3) AST -> FIR -> MLIR LLVM-IR -> LLVM-IR
> > >
> >
> > I suspect 2 makes sense over 1 only if you include more MLIR construct
> > (like the OpenMP dialect), otherwise I don't see the difference between 1
> > and 2?
>
> It depends. I think 1 is "less" work but more "wasted" work.

1. Trying to bridge from the f18 parse trees straight to LLVM-IR will be a ton
of work and should be throw-away work at that.

I'd like to understand what work would be conceptually different to
things we need to do for 3) as well. I would have assumed we need to
lower Fortran constructs to LLVM-IR eventually, if we do it in one or
two steps should not matter much except that in the former case we do
not need to keep some Fortran constructs alive in a different
representation.

I would also think the second lowering step in 3) (FIR -> LLVM-IR)
would look pretty much like the lowering in 1)? If you disagree please
let me know why.

2. has problems in that semantics from Fortran cannot be represented in generic
off-the-shelf MLIR. It would've been great, but alas. :slight_smile:

Interesting. Does that mean you do not plan to lower FIR to the LLVM-IR
MLIR dialect at all or do you plan to make MLIR stronger to support the
lowering?

3. This is where the real action is presently. It would be great to have more
volunteers, who want to contribute to the project, jump in and assist here.
There is plenty that can be worked on. Please contact me, if interested.

As I mentioned on the MLIR mailing list before, I'm not looking for more
work (others might though). On the contrary, I'm trying to determine how
we can get a working Fortran compiler with the least amount of work. The
MLIR folks have also mentioned various (fairly basic) parts that need to
be worked on (see the thread on the other list).

> > > My questions/statements here are concerning the MLIR LLVM-IR dialect and
> > > the lowerings to/from it, as well as the interplay of MLIR dialects and
> > > LLVM-IR. I formulated some parts as statements and I would appreciate it
> > > if you could comment on my understanding. If I missed the appropriate
> > > documentation page, please forgive me.
> > >
> > > 1) Can I lower different dialects into LLVM-IR at the same time or do I
> > > need to lower to MLIR LLVM-IR first? When I ask if "I can do that" I
> > > mean if it is a use case that should conceptually work and also if it
> > > is already done by someone, thus actually working right now.
> > >
> >
> > You can export to LLVM IR on your own from any dialect (ultimately Clang
> > does it while traversing its AST, so you can traverse your own IR the same
> > way).
>
> Perfect. That is needed for the FIR + OpenMP dialect already since the
> latter shall not be lowered to MLIR LLVM-IR first.

The Tilikum bridge lowers the bulk of FIR (modulo bugs and NYI features) and
the standard MLIR dialect both in one big dialect conversion. Much of Tilikum
is prototyped now.

Tilikum bridge? Is that what lowers FIR + MLIR into LLVM-IR? Where do I
find this prototype? Is there a list of features (expceted to be) working?

But that's the beginning of the story, as there are features to be completed
in the MLIR LLVM-IR dialect as well (which is where your subsequent conversion
goes). Furthermore, features to support Fortran will need to find their way
into LLVM (proper) itself.

Could you elaborate on both these points? What features do we need in
the LLVM-IR MLIR dialect? (Do we lower FIR into that one? see above.)
What features are you missing in LLVM-IR? It might be good to get these
discussions going earlier than later to speed up the development and
planning process.

> > The point of the LLVM dialect is to make this easier and more re-usable /
> > composable though.
>
> That makes sense.
>
>
> > > 2) I browsed the LLVM-IR MLIR dialect and it looks like the
> > > instructions, attributes, etc. are hard coded, correct? (I mean we
> > > need to add them one by one to match LLVM-IR and keep them in-sync).
> > >
> >
> > Yes.
> >
> > We have this longer term idea to generate the LLVM IR constructs (at least
> > the verifiers, etc.) from the same tablegen as the MLIR dialect (it can be
> > generating the MLIR dialect from a TableGen in LLVM for instance), that's
> > gonna take more discussions within LLVM though.
>
> I think that is reasonable, probably takes a while to prepare though.
>
>
> > > 3) As far as I can tell,
> > > a) various instructions are present already (in their basic form,
> > > e.g., no nsw/nuw, inbounds, ...) but there seems to be some
> > > missing (switch was one I didn't find immediatly). Is there a
> > > list?
> > >
> >
> > I don't think we have a list, we discussed this recently:
> > https://groups.google.com/a/tensorflow.org/d/msg/mlir/gUTcuFex7eA/Ebj38saiBQAJ
> > It'd be nice to compute the full list indeed.
>
> Thanks for the reference. I read this thread a month ago but forgot
> about it again.
>
>
> > > c) I also did only find a handful of attributes (noalias &
> > > nosideeffect).
> > >
> >
> > I don't even think that "nosideeffect" maps to anything when exporting to
> > LLVM IR at the moment.
>
> Right. So noalias is the blueprint it seems.
>
>
> > > d) Global symbols seems to be very restricted right now e.g.,
> > > variables are internal only, functions external, right?
> > >
> >
> > Right, we haven't yet added Linkage and ThreadLocalMode attributes on
> > these. The dialect has been brought as the need came for lowering from
> > higher-level dialects.
> > The more interesting part is handling these in the general FuncOp.
>
> What do you mean with the last sentence? Is there some bigger design
> question here to add linkage types (etc.) to the MLIR LLVM-IR dialect?
>
> Cheers,
> Johannes

--
Eric

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Cheers,
  Johannes

Hi Johannes,

Yes, the plan is certainly for flang/f18 is to use the LLVM-IR dialect of MLIR as the pathway to LLVM.

As far as the question about the differences between the breadths of semantics gaps and by way of analogy, it should be clear that there is a real labor and time cost between buying orange juice at the grocery and growing one's own orange tree, even though when our breakfast is served the juice in the glass is very much the same.

While I certainly understand a desire to accelerate Flang/F18 development, I worry paths #1 and #2 miss the goal of supporting strong Fortran capabilities.

To echo what Eric said in a slightly different way, the FIR stage supports Fortran-centric analysis and optimization capabilities. I believe LLVM IR (at the core or in MLIR form) looses an awareness that is needed to develop a fully optimizing Fortran compiler. #1 and #2 support a path for a (perhaps more C-family friendly) optimizing compiler but not necessarily that of an optimizing Fortran compiler. I think that’s a significant difference and a disadvantage – one that likely misses many of the potential advantages to be gained by analysis and optimization at a level with more Fortran awareness.

Before adoption into the LLVM community, the plan of record for F18/Flang was to introduce a FIR level to enable such Fortran-centric features. By leveraging MLIR there is a chance to avoid building an entirely one-off piece of infrastructure. In fact, it may (probably is) the case that an MLIR-base implementation may be (probably is) more robust than a brand new, stand alone, FIR implementation.

There may be a good argument for a faster delivery path but we should also understand that it is likely to fall short of achieving what many would consider Fortran-centric performance improvements. I think the only way to gain some of the advantages of the FIR implementation path in approach #1 or #2 would require changes at the LLVM IR level. I’m not sure that’s a reasonable path to take across the board…

—Pat

Dear Eric,

Hi Johannes,

Sorry for any confusion.

I disagree that the proposed shortcuts are actually shortcuts.

“3. This is where the real action is presently. It would be great to have more
volunteers, who want to contribute to the project, jump in and assist here.
There is plenty that can be worked on. Please contact me, if interested.”

Eric, would it be possible to create a projects page (https://github.com/orgs/flang-compiler/projects) if you believe there is work that others can do to help here?

It would also be helpful if there are public instructions on how to build the f18 compiler, mlir repo and llvm together. Which repos/branches should we be using? And are PRs being accepted to these repos/branches? Or are you talking about contributions to upstream MLIR repo for the llvm dialect?

Is (3) AST → FIR → MLIR LLVM-IR → LLVM-IR) in a shape where we can generate LLVM IR now?

Thanks,
Kiran