[RFC] `address` dialect

That’s fair, and I’d love to actually split the LLVM dialect into its individual components (pointer and pointer manipulation and concept, arithmetic, and structure/aggregate manipulation, etc.). The LLVM dialect would benefit being organized as a collection of such: LLVM is monolithic by construction, but the independent concepts still exists inside LLVM IR.
We’re already splitting the LLVM “target” with the LLVMTranslationInterface used by NVVM and others actually, this wouldn’t be a very disruptive change I believe (other than dialect prefix in the textual format).

I wouldn’t be against using a spirv concept if it was relevant, but it turns out that SPIRV types/operations exists to be restrictive and specific to a semantics that does not generalize. What stuck me here is that since LLVM moved to “typeless” pointers, the pointer proposal here looks awfully just like the LLVM pointer (it’s not surprising since LLVM pointers and pointer arithmetic is not far from the purest form you can design I suspect).

To my thinking, this is actually the strongest reason to not add an address dialect right now. If the path is to modularize llvm, then it will immediately make little sense in that world, and we’ll soon be having the same manner of debate that we’re currently having with arith.

1 Like

That’s debatable and it could be argued that those kind of ptrs are neither ptr nor memrefs. However, people have modeled fat pointers as pointers in specific address spaces for a while, see for example this CHERI presentation.

In that case, there’s no round trip and I’d consider that part of the semantics of the address space.

@stellaraccident is completely right here, my proposal came from a place where the LLVM dialect should be a faithful representation of LLVM IR. I also agree with the idea that type duplication should be avoided, that’s why I later pivoted and proposed making !llvm.ptr a builtin type, there’s no reason it can’t be one.

Having said that, even after the split I think there’s still a need for some of these ops.

Revised proposal

Make !llvm.ptr a builtin-type ptr with a generic attribute address space, and add all 7 ops as the address dialect to interact with the new builtin.ptr type.

Most of the new ops don’t collide with LLVM ops, and the ones that do like cast, cast_int can be merged when the split happens.

1 Like

This looks like some interesting piece of work and has opened some really interesting conversation! Having ptr as a builtin type looks like a great idea and would make sense for many projects such as yours, and ours (SYCL-MLIR, EuroLLVM poster).

We have to +1 allowing any kind of attribute as address space. We also identified this as a lack of abstraction in llvm.ptr. In the SYCL case, we can have four high-level memory spaces: private, local, global and generic. Being SYCL a multi-target programming paradigm, these will be converted to different address spaces depending on the target, so having this bit of abstraction would be ideal. Of course, further restrictions would be added when translating to LLVM IR.

Current dialect looks nice, tho! We are however thinking of two interesting proposals that might be handy for some projects:

  • ptrdiff: difference between two pointers
  • nullptr: Some targets might not use 0 as a nullptr representation. LLVM IR is lacking this abstraction, but we can have it and assign a value on conversion. This way, e.g., for an AMDGPU target, address.nullptr : ptr<#gpu.address_space<global>> would lower to llvm.mlir.zero and address.nullptr : ptr<#gpu.address_space<workgroup>>, to llvm.mlir.constant 0xFFFFFFFF (see).

We think this is an interest topic and would add value to MLIR, as we can see other projects benefiting from this. It’s always interesting to see other people’s work and how they bring good ideas to the table.

2 Likes

Thanks for the proposal. I think it would be nice to have a dedicated dialect for addressing and the pointer add operation is especially nice.

Can you explain the motivation for the type_offset operation? I would have expect the the size of a type can be determined using the DataLayout dialect? Or do you see a benefit in having the size explicit in IR somehow?

Yes, it should be possible to determine the type offset from the data layout. It’s not mentioned in the proposal, but I already have in mind a pass that substitutes type_offset with arith constants using the data layout.

Now, type_offset exists because there’s no guarantee that a data layout will be available from the start -e.g. in cases where the target has not been determined, and in those cases a different mechanism should exist to express the type offset (addr.type_offset).

That makes sense! I was not thinking of such a high-level scenario :).

Continuing the discussion here: