[RFC] Add nusw and nuw flags for getelementptr

nikic · April 30, 2024, 4:45am

(This follows up on a previous discussion in Retain fact that index/offsets for GEP are non-negative for `size_t/uint64_t` offsets? as an RFC.)

Proposal

Add two new flags to the getelementptr instruction, in addition to the current inbounds flag. The two new flags are:

nusw (no-unsigned-signed wrap). This carries all the overflow-related requirements from inbounds, without the allocated object requirements. inbounds implies nusw and the latter is not explicitly printed if inbounds is set.
nuw (no-unsigned-wrap), specifies that both the offset arithmetic and the addition of the offset to the base address do not wrap in an unsigned sense.

As the getelementptr inbounds rules are unusually complex, I’ll provide a draft of the proposed LangRef wording here:

The getelementptr instruction may have a number of attributes with different rules. If any of the rules are violated, the result value is a poison value. In cases where the base is a vector of pointers, the attributes apply to each computation element-wise.

For nusw (no unsigned signed wrap):

If the type of an index is larger than the pointer index type, the truncation to the pointer index type preserves the signed value (trunc nsw).

The multiplication of an index by the type size does not wrap the pointer index type in a signed sense (mul nsw).

The successive addition of each offset (without adding the base address) does not wrap the pointer index type in a signed sense (add nsw).

The successive addition of the current address, interpreted as an unsigned number, and each offset, interpreted as a signed number, does not wrap the unsigned address space.

For nuw (no unsigned wrap).

If the type of an index is larger than the pointer index type, the truncation to the pointer index type preserves the unsigned value (trunc nuw).

The multiplication of an index by the type size does not wrap the pointer index type in an unsigned sense (mul nuw).

The successive addition of each offset (without adding the base address) does not wrap the pointer index type in a unsigned sense (add nuw).

The successive addition of the current address, interpreted as an unsigned number, and each offset, also interpreted as a unsigned number, does not wrap the unsigned address space (add nuw).

For inbounds all rules of the nusw keyword apply. Additionally, if the getelementptr has any non-zero indices, the following rules apply:

The base pointer has an in bounds address of an allocated object, which means that it points into an allocated object, or to its end. Note that the object does not have to be live anymore; being in-bounds of a deallocated object is sufficient.

During the successive addition of offsets to the address, the resulting pointer must remain in bounds of the allocated object at each step.

A corollary is that if the getelementptr is both nusw and nuw, the offset must be non-negative.

It’s worth noting that the nuw flag does not change the fact that GEP indices that are smaller than the index type size are sign extended: As usual, the flag only results in additional poison values, without affecting core semantics. These kinds of GEPs are non-canonical anyway, and of no practical relevance (we should forbid them, but there’s the usual challenges around constant expressions).

Motivation

There are two primary motivations to introduce additional flags. The first is to improve our ability to preserve GEP flags. The second is to better capture frontend semantics in IR.

Current uses of inbounds

While inbounds provides a guarantee that the pointer arithmetic remains in bounds of an allocated object, we almost always only make use of the no-wrap properties this implies.

Specifically, inbounds is mostly used for its implication that the offset arithmetic is nsw and the base plus offset arithmetic is nusw. Usually we additionally need to know that the offset is also non-negative, in which case nusw can be strengthened to the much more useful nuw property.

I’m currently only aware of two places where we perform optimizations that depend on the “allocated object” property:

A special case in alias analysis: If we have an inbounds GEP with known offset N, then it cannot alias with an identified object smaller than N.
We know that getelementptr inbounds p, x is non-null if p is non-null, in cases where no allocated objects are allowed at the null address. While this holds with inbounds, just nusw is not sufficient to derive this fact. However, nuw is sufficient.

As such, nusw captures nearly all the value of inbounds, especially when also combined with nuw.

Flag preservation

The current getelementptr inbounds attribute is unusually challenging to preserve, in multiple ways.

The first is that analyses like scalar evolution (SCEV) track overflow behavior of expressions and recurrences, but are not equipped to reason about allocated objects. For that reason, getelementptr instructions produced by SCEVExpander are never annotated with inbounds.

However, while preserving inbounds information in SCEV is not feasible, it could easily preserve no-wrap information.

The second preservation problem relates to reassociation transforms. The following reassociation is, in general, illegal:

getelementptr inbounds (getelementptr inbounds p, i1), i2
->
getelementptr inbounds (getelementptr inbounds p, i2), i1

This is because i1 and i2 may have different signs, and changing the order of the additions may result in the arithmetic temporarily going out of bounds. This issue also applies when replacing inbounds with nusw. However, the following reassociation with nuw is legal:

getelementptr nuw (getelementptr nuw p, i1), i2
->
getelementptr nuw (getelementptr nuw p, i2), i1

Preserving no-wrap / non-negative information from the frontend

For expressions like ary[idx], the source language may specify that if idx is an unsigned type, the addition of ary and idx cannot overflow in an unsigned sence, or that idx cannot be negative. This kind of information is currently completely lost when translating to LLVM IR.

This information is valuable for optimizations in a number of ways. For example, it allows us to optimize expressions like ary[idx] >= ary to true instead of idx >= 0.

Another example is illustrated by structures with embedded arrays:

struct SmallVec {
   size_t len;
   T elems[MaxLen];
};

With getelementptr nuw, we can know that a write to elems[idx] will not clobber the len member, which can be critical to enable follow-on optimizations.

We will definitely be able to make use of getelementptr nuw in Rust. I’m not sure whether there is consensus that that using it for unsigned offsets is valid in C++ (the relevant wording is unclear), but at least UBSan in clang already diagnoses it, so it seems likely.

Allow frontends to experiment with different pointer arithmetic semantics

C and C++ make non-inbounds pointer arithmetic undefined behavior. Because LLVM currently only supports these semantics, other frontends are forced to adopt the same semantics.

The Rust language is considering to instead only require no-wrap for pointer arithmetic, but cannot do so unless LLVM provides a way to encode this. (Whether Rust actually ends up doing this is still unclear at this point, as there are advantages and disadvantages to both semantics – however, currently it’s not even possible to experiment with this.)

Alternatives

Use nsw instead of nusw

The nusw terminology is adopted from SCEV and emphasizes the special wrapping semantics when adding the offset to the base, which are not quite the same as nsw. However, the offset arithmetic itself is nsw, so it wouldn’t be particularly wrong to use the more familiar attribute name either. I don’t have a strong opinion on this.

Use nneg instead of nuw

Instead of adding a nuw flag, an alternative would be to add nneg instead, which specifies that the offset must be non-negative.

If the nusw or inbounds flag is present, then nuw and nneg are equivalent. However, if nusw is not present, then their semantics differ and neither implies the other.

In that case, I believe that nuw enables more optimizations. For example, gep nneg p, x does not imply that the p + x addition is nuw, and also doesn’t imply that (gep nneg p, x) uge p is true. (It does imply these when combined with inbounds/nusw.)

Additionally, nuw could be useful if LLVM ever allowed allocated objects to exceed half the size of the address index space.

Don’t make inbounds imply nusw

The inbounds property could be separated from nusw by saying that it only requires the original pointer and the result pointer to be in bounds of an allocated object – while this has some no-wrap implications, it has less than the full set of nusw requirements.

I don’t think doing this is a good idea, because I’m not aware of a use-case for it, while it would increase the amount of different combinations we have to reason about.

Only add the nuw flag

Most of the value of this proposal would still be captured if we only added the nuw flag, without the nusw flag. It is sufficient to convey key information about frontend semantics to the optimizer.

However, I still think that adding nusw has value, because it allows us to more clearly express intent in analyses and optimizations. Rather than working with inbounds but only using its nusw properties, we could explicitly work with nusw.

nusw is also useful to enable frontends to experiment with different pointer wrapping semantics (though just nuw might also be “good enough” for that purpose).

On the other hand, adding nusw means we have to think about an extra flag when performing optimizations on GEPs.

efriedma-quic · April 30, 2024, 7:45am

As far as I know, using nuw for unsigned array indexing in C/C++ should be fine; I’m not aware of any ambiguity.

Is ptradd_nusw(ptradd_nusw(a, b), c) supposed to be equivalent to ptradd_nusw(a,b+c)? If I’m following the rules you’re describing correctly, it isn’t equivalent because b+c could have signed overflow. Do we currently do this transform with inbounds? (This is sort of related to the “allocated object” property if you assume allocated objects are less than half the address-space.)

nikic · April 30, 2024, 8:59am

That’s good to know. I was a bit unsure because it’s not explicitly called out in [expr.add], but I assume the logic is that if j is unsigned then i + j will be performed with unsigned arithmetic and end up violating the <= n condition.

Agree that these are not equivalent.

I’m not aware of us doing this transform anywhere. Our canonicalization is in the reverse direction, i.e. from ptradd(p, a+b) to ptradd(ptradd(p, a), b). That transform can only preserve inbounds only if the add is nsw and also has non-negative arguments ([InstCombine] Preserve inbounds when canonicalizing gep+add by nikic · Pull Request #90160 · llvm/llvm-project · GitHub). The same would be true to nusw. For nuw we could preserve it if the add is nuw.

tschuett · April 30, 2024, 11:22am

The GlobalIsel IRTranslator sometimes adds a nuw to ptradd:

github.com

llvm/llvm-project/blob/6fab3f2a2b04048aaa7d76d067f6cd4704bb4002/llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp#L1676


      
                  GepOffsetReg = IdxReg;
          
                BaseReg = MIRBuilder.buildPtrAdd(PtrTy, BaseReg, GepOffsetReg).getReg(0);
              }
            }
          
            if (Offset != 0) {
              auto OffsetMIB =
                  MIRBuilder.buildConstant(OffsetTy, Offset);
          
              if (int64_t(Offset) >= 0 && cast<GEPOperator>(U).isInBounds())
                Flags |= MachineInstr::MIFlag::NoUWrap;
          
              MIRBuilder.buildPtrAdd(getOrCreateVReg(U), BaseReg, OffsetMIB.getReg(0),
                                     Flags);
              return true;
            }
          
            MIRBuilder.buildCopy(getOrCreateVReg(U), BaseReg);
            return true;
          }

jayfoad · April 30, 2024, 1:23pm

More pedantically “… the low N bits of the current address, interpreted as an unsigned number, … does not wrap the unsigned N-bit range”, where N is pointer index size in bits. Right?

aeubanks · April 30, 2024, 8:36pm

Thanks for working on this! We’ve definitely seen cases of bounds checks not optimized away due to dropping frontend unsigned index information.

Could you explain this part more? I can vaguely see the link between allocated objects not exceeding half the address index space size and nuw/nneg, but am failing to grasp the details.

nikic · May 1, 2024, 1:49am

Yes, that’s right. The base + offset addition only works on the low index type width bits, and the no-wrap conditions only apply to that part as well.

If we, hypothetically, assumed that LLVM supports allocated objects larger than half the address space, and a frontend wanted to make use of that capability, it could emit getelementptr nuw (without inbounds) to work with such objects, while still retaining some useful no-wrap properties. It would not be able to use getelementptr nneg, as the unsigned index may legitimately be large enough to become negative when interpreted as a signed number.

Having given this some more thought, I think that nuw is overall a lot more useful in terms of optimization capabilities. When combined with inbounds/nusw both flags are equivalent anyway, so the interesting case to consider is when we only have nuw/nneg. In that case, nneg does not actually guarantee things like (gep nneg p, x) uge p.

So I think if we just viewed nneg as an addition to inbounds then that would be fine, but if it should also stand on its own, then nuw is preferable.

(Edit: I’ve updated the proposal to that effect.)

nikic · May 2, 2024, 6:36am

I’ve created a patch for this RFC here: [IR] Add getelementptr nusw and nuw flags by nikic · Pull Request #90824 · llvm/llvm-project · GitHub

hansw2000 · May 2, 2024, 12:24pm

+1 getting this fixed will be nice.

As the getelementptr inbounds rules are unusually complex, I’ll provide a draft of the proposed LangRef wording here:

They are complex indeed Could we include examples in the LangRef of when each flag would apply and why? (Maybe based on C or Rust code, or maybe just IR; the important part would be the explanations.)

fhahn · May 2, 2024, 6:05pm

Agreed, thanks for pushing this forward!

aeubanks · May 3, 2024, 10:45pm

nneg vs nuw makes sense, thanks.

I don’t see how this really increases the number of different combinations to reason about, you still have to reason about nusw vs the allocated object property of inbounds separately with the current proposal to figure out if you can preserve inbounds/nusw. Orthogonality of attributes is generally a good idea. Requiring nusw when inbounds is confusing as an invariant. For example, in the PR setting inbounds also sets nusw, but clearing inbounds doesn’t clear nusw.

RalfJung · May 4, 2024, 7:29am

If inbounds implies nusw, then clearing inbounds should not affect nusw. However, clearing nusw should clear inbounds. This would ensure the invariant: if inbounds is set, then nusw is set.

nikic · May 4, 2024, 11:47am

It’s probably worth clarifying here what part of nusw is “naturally” implied by inbounds and what is an additional requirement.

The nusw rules have four parts: The first three relate to the offset arithmetic (truncation, multiplication by type size and addition of multiple offsets), and the fourth to the base plus offset addition. I believe that the “in bounds” property (plus our usual requirements for allocated objects) directly imply the fourth rule, but not the first three, at least as phrased here.

However, only the base + offset addition is actually fundamental to the GEP operation. In ptradd representation, this is the only part of GEP that remains. The arithmetic for deriving the offset is represented using separate instructions. So once the GEP → ptradd migration is complete, we would have a direct implication of inbounds → nusw. It would no longer be a separate requirement.

From a migration perspective, I think that having inbounds imply nusw makes things a lot simpler, because existing code (in LLVM, Clang and other frontends) generating inbounds GEP will continue to be optimized well without having to be changed to explicitly set the nusw flag as well.

We can start switching code to check hasNoUnsignedSignedWrap() instead of isInBounds() right away, without having to worry about frontends not setting it or it not being preserved, etc.

That’s exactly what the implementation does.

lukel · May 5, 2024, 1:37pm

This would help with an issue in the WebAssembly backend where we are unable to use the offset addressing mode for loads like:

extern int *data;
int f(unsigned idx) {
  return data[idx];
}

Due to the effective address calculation being unsigned

aeubanks · May 6, 2024, 10:36pm

I see, so long term with ptradd, the nsw index addition portion of gep nusw will be on the separate add/trunc/mul instructions and nusw will just apply to base + offset. The allocated object portion of the inbounds requirements does seem to imply nusw on base + offset.

Yes, this maintains that invariant, but it’s still confusing that nusw and inbounds are separate properties that have some invariant only known by reading the LangRef. Perhaps a better way to model this is with a tristate flag none (can be implicit)/nusw/inbounds, rather than having nusw and inbounds be separate flags?

nikic · May 15, 2024, 1:17pm

I think that from an implementation perspective, it is more convenient to treat these as implied flags – that is, I want hasNoUnsignedSignedWrap() to return “true” if inbounds is set, rather than having to check for both flags separately.

I’ve updated the implementation to introduce a new GEPNoWrapFlags class (see last commit on [IR] Add getelementptr nusw and nuw flags by nikic · Pull Request #90824 · llvm/llvm-project · GitHub) which holds the three flags (and enforces the inbounds → nusw implication). It currently implicitly converts bool to InBounds to be compatible with current usage, and for now also retains setInBounds() – but in the future, we should both remove that implicit conversion and only have setNoWrapFlags(). I think this should mostly address your concern, because we no longer end up passing around three flags that are not actually fully independent around everywhere.

RalfJung · May 16, 2024, 6:06am

A tristate flag could still have an hasNoUnsignedSignedWrap getter that returns true when the flag is either inbounds or nusw, no?

aeubanks · May 16, 2024, 4:56pm

I think with GEPNoWrapFlags, there’s not really a difference between having a tristate flag and two separate flags with an invariant between them since all of the invariants are held within the class and users don’t have to worry about it. Bitcode-wise it’s also going to be the same. With textual IR the question is do we want to allow the slightly redundant “gep inbounds nsw” or not, which probably isn’t a super important choice. And with two separate flags GEPNoWrapFlags is slightly easier to implement, at least only given the one invariant that nsw must be true if inbounds. If we add more flags to GEPs we may have to revisit, but I think we can easily redesign without making huge changes across GEP APIs and bitcode representation.

hazzlim · May 22, 2024, 2:42pm

Thanks for all your work on this! When the patch for this RFC lands in LLVM is anybody looking into how to use it within Clang? If not then we at Arm are interested in taking a look.

nikic · July 1, 2024, 9:05am

The patch has since landed, and I’ve done some initial work on supporting the flags on the LLVM side. I’d be happy to leave the Clang-side integration to you.

Topic		Replies	Views
`getelementptr nusw` semantics question IR & Optimizations	2	189	November 5, 2024
new LLVM IR features LLVM Dev List Archives	4	90	July 29, 2009
RFC: inbounds on getelementptr indices for global splitting LLVM Dev List Archives	4	96	July 26, 2016
Question regarding getElementPtr/Addressing modes in backend LLVM Dev List Archives	6	103	October 24, 2014
[RFC] Add nowrap flags to trunc IR & Optimizations	9	793	April 1, 2024