[RFC] Metadata attachments for function arguments

Hello everyone!

Problem

As described in eg Support !range metadata for by-value function arguments · Issue #76628 · llvm/llvm-project (github.com) it is currently not possible to give range information for function arguments.

Proposal

After reading this (old and maybe not relevant anymore) discussion String attributes for function arguments and return values - Project Infrastructure / LLVM Dev List Archives - LLVM Discussion Forums my proposal to solve the problem is to add support for Metadata attachments for function arguments in defines and declares like eg:
define void @test4(i32 noundef !foo !2 !baz !8 %a, i32 !range !9 %b)

by extending the argument list syntax like:
<type> [parameter Attrs] (!name !N)* [name]

As this is my first time trying to contribute to llvm I started by creating a prof of concept in [IR] Add support for metadata attachments for Function Arguments by andjo403 · Pull Request #78893

Alternative

Create a new attribute for range.
From what I can see this needs some new storage handling as before only int/enum/string/types have been stored in attributes but maybe I’m wrong.
One other thing is that there is already handling for eg range metadata so shall this attribute return a MDNode so that code can be reused?

1 Like

I don’t think we should add support for metadata attachments on arguments, as this will introduce two different ways to represent constraints on function arguments. The attribute system has the advantage of being more efficient, while the metadata system is more generic – but I don’t think we have need of its full generality here.

Create a new attribute for range.

This alternative would definitely be my preference.

From what I can see this needs some new storage handling as before only int/enum/string/types have been stored in attributes but maybe I’m wrong.

Yes, this would require a new storage type. I would recommend storing it as a ConstantRange.

I think this storage type will also come in handy for other things. An obvious example is the existing vscale_range attribute (currently encoded as a packed integer, and gets away with it because the ranges it represents are somewhat constrained). Another example would be an extension to dereferenceable that specifies both a lower and an upper bound, rather than only a lower bound (this is useful for alias analysis).

One other thing is that there is already handling for eg range metadata so shall this attribute return a MDNode so that code can be reused?

No, the API should return a ConstantRange. That’s what all the code later works on.

The !range MDNode representation is slightly more general in that it allows specifying multiple ranges, but I believe that this is an unnecessary over-generalization. I don’t think frontends make use of this feature, and the optimizer usually just unions all the constituent ranges together.

As functions have metadata and attributes I thought it was two by design and that metadata was supposed to only be used for optimization hints as it can be removed.

one downside of not using the same format as in the metadata is that if the frontend creates multiple ranges in the range metadata it now needs a new functionality to create the data for the attribute.

My interpretation was that multiple ranges was to support enums with gaps in them, but as you say most uses seems to combine them, valueTracking was one exception from that.

Not many more comments for this.
Looks like I shall close my current PR that add metadata to function arguments and the proposal is to add a new Attribute range(<min>, <max>) to Parameter Attributes by adding the possibility for ConstantRange attributes.
@nikic shall a new RFC be created for the attribute proposal or shall it be the result of this one?

Hi again have been trying to create the ConstantRange attribute type but have a problem with my previous suggestion of range(<min>,<max>) as there is no information about bit width of the APInt in the ConstantRange.

So my question is shall there be some information about number of bit in the IR?

For MDNode each field have a type like (i32 42) so shall the ConstantRange attribute type also have a type or a bit width? alternatives:

  1. use one type : range(<ty>, <min>,<max>) range(i32,-1,255)
  2. use a number of bits : range(<bits>, <min>,<max>) range(32, -1,255)
  3. use a type per value (needs to be the same) : range(<ty> <min>, <ty> <max>) range(i32 -1, i32 255)

or maybe there is no need for a fixed bit width.
in that case maybe when parsing positive values add 1 bit to the bit width of the lexed value and always write as signed APInt to get the asm parse/write roundtrip to give the same result. then also need to select the largest of min and max bit width for the ConstantRange bit width.
feels a bit strange to not have a fixed bit width.

I would cap it at 64 in the lang ref. The parser can then read it as such. Neither more nor less bits seem of practical use.

1 Like

That suggestion simplifies it a lot. I was so fixed in making it similar to how the metadata range was working that I did not think of that solution.

Limiting this to 64-bits is a bit unfortunate. This is fine for attributes like dereferenceable() where larger values truly make no sense, but for range() use with larger integer types is plausible. For example, Rust has a NonZeroU128 type, which could make use of the attribute.

I was going to suggest that we can make use of the type specified for the parameter, i.e. when we have i32 range(A,B) %x parse it as a 32-bit range. Unfortunately, this wouldn’t work straightforwardly for return values, because they use (why???) the reverse order range(A,B) i32 instead.

Generally, I would expect there to be a verifier requirement that the parameter type and the range() attribute have the same bitwidth. Otherwise the attribute becomes hard to interpret. If you always store a 64-bit range, then truncating it to the correct width will not produce the desired result (it will be a full range instead).

So overall I think we’ll have to live with something slightly ugly like i32 range(i32 A, B) %x for params and range(i32 A, B) i32 for returns.

I don’t get this, esp. when the two values fit into 32-bit. If this was not the case, one could pick 128 instead of 64.

What I was thinking about when I wrote that is that you’d take a 64-bit ConstantRange and then call truncate() on it to adjust it to the correct size. If you have a range like [1,0) and truncate() that to a smaller size you’ll get a full range back. What would probably work fine is to construct a new ConstantRange by explicitly truncating Lower and Upper.

But I still think that allowing the range to have a different bit-width from the parameter type is going to cause unnecessary complications…

Yeah. I start to think you are right and we need to just pay the cost of annotating the type.

have a PR ready now for adding the attribute and the new ConstantRange attribute type [IR] Add new Range attribute using new ConstantRange Attribute type by andjo403 · Pull Request #83171 · llvm/llvm-project (github.com)

Can I use the range attribute in assume operand bundles?

The perf impact of Add `assume`s to slice length calls by scottmcm · Pull Request #122926 · rust-lang/rust · GitHub was pretty high, and I’m hoping that doing it with the attribute instead of icmp would help…

I don’t see why not. That’s what assume operand bundles are meant for.
It’s not equivalent to the proposal but it has other benefits.

There is currently no optimization that use the range attribute from “assume operand bundles”.

I can try to add it but maybe will be a while, I’m still working on adding handling for all the places that use the range metadata.

Also do not know how easy it is to add from what I found almost only value tracking have used “assume operand bundles” before and do not know if that is enough for the optimization that you want.

I’m in no rush here :slight_smile:

I’d just like to be able to use fewer icmp+assumes in things like

I can’t say I know what “value tracking” is specifically, but from the name it sounds like it’d plausibly be sufficient.

Maybe even better than the assumes today, since I hit some places where the assume(ule(x, INT_MAX)) turns into assume(sgt(x, -1)) and that ends up confusing the optimizer into not being able to take advantage of it later.

Could you share an example for that? For range-based optimizations, we usually don’t care how exactly the comparison is written, so this seems unexpected…

Thanks for the poke on this. I opened `nuw nsw` not deduced for `add 1` inbounds of range-restricted length · Issue #87854 · llvm/llvm-project · GitHub for what I think it was.

(I don’t remember exactly what I was seeing originally, and of course before I was looking at Rust output which isn’t on trunk, but I think that issue is somewhat close at least.)