(This follows up on a previous discussion in Retain fact that index/offsets for GEP are non-negative for `size_t/uint64_t` offsets? as an RFC.)
Proposal
Add two new flags to the getelementptr instruction, in addition to the current inbounds
flag. The two new flags are:
nusw
(no-unsigned-signed wrap). This carries all the overflow-related requirements frominbounds
, without the allocated object requirements.inbounds
impliesnusw
and the latter is not explicitly printed ifinbounds
is set.nuw
(no-unsigned-wrap), specifies that both the offset arithmetic and the addition of the offset to the base address do not wrap in an unsigned sense.
As the getelementptr inbounds rules are unusually complex, I’ll provide a draft of the proposed LangRef wording here:
The getelementptr instruction may have a number of attributes with different rules. If any of the rules are violated, the result value is a poison value. In cases where the base is a vector of pointers, the attributes apply to each computation element-wise.
For
nusw
(no unsigned signed wrap):
- If the type of an index is larger than the pointer index type, the truncation to the pointer index type preserves the signed value (trunc nsw).
- The multiplication of an index by the type size does not wrap the pointer index type in a signed sense (mul nsw).
- The successive addition of each offset (without adding the base address) does not wrap the pointer index type in a signed sense (add nsw).
- The successive addition of the current address, interpreted as an unsigned number, and each offset, interpreted as a signed number, does not wrap the unsigned address space.
For
nuw
(no unsigned wrap).
- If the type of an index is larger than the pointer index type, the truncation to the pointer index type preserves the unsigned value (trunc nuw).
- The multiplication of an index by the type size does not wrap the pointer index type in an unsigned sense (mul nuw).
- The successive addition of each offset (without adding the base address) does not wrap the pointer index type in a unsigned sense (add nuw).
- The successive addition of the current address, interpreted as an unsigned number, and each offset, also interpreted as a unsigned number, does not wrap the unsigned address space (add nuw).
For
inbounds
all rules of thenusw
keyword apply. Additionally, if the getelementptr has any non-zero indices, the following rules apply:
- The base pointer has an in bounds address of an allocated object, which means that it points into an allocated object, or to its end. Note that the object does not have to be live anymore; being in-bounds of a deallocated object is sufficient.
- During the successive addition of offsets to the address, the resulting pointer must remain in bounds of the allocated object at each step.
A corollary is that if the getelementptr is both nusw and nuw, the offset must be non-negative.
It’s worth noting that the nuw flag does not change the fact that GEP indices that are smaller than the index type size are sign extended: As usual, the flag only results in additional poison values, without affecting core semantics. These kinds of GEPs are non-canonical anyway, and of no practical relevance (we should forbid them, but there’s the usual challenges around constant expressions).
Motivation
There are two primary motivations to introduce additional flags. The first is to improve our ability to preserve GEP flags. The second is to better capture frontend semantics in IR.
Current uses of inbounds
While inbounds
provides a guarantee that the pointer arithmetic remains in bounds of an allocated object, we almost always only make use of the no-wrap properties this implies.
Specifically, inbounds is mostly used for its implication that the offset arithmetic is nsw and the base plus offset arithmetic is nusw. Usually we additionally need to know that the offset is also non-negative, in which case nusw can be strengthened to the much more useful nuw property.
I’m currently only aware of two places where we perform optimizations that depend on the “allocated object” property:
- A special case in alias analysis: If we have an inbounds GEP with known offset N, then it cannot alias with an identified object smaller than N.
- We know that
getelementptr inbounds p, x
is non-null if p is non-null, in cases where no allocated objects are allowed at the null address. While this holds with inbounds, just nusw is not sufficient to derive this fact. However, nuw is sufficient.
As such, nusw captures nearly all the value of inbounds, especially when also combined with nuw.
Flag preservation
The current getelementptr inbounds
attribute is unusually challenging to preserve, in multiple ways.
The first is that analyses like scalar evolution (SCEV) track overflow behavior of expressions and recurrences, but are not equipped to reason about allocated objects. For that reason, getelementptr instructions produced by SCEVExpander are never annotated with inbounds
.
However, while preserving inbounds information in SCEV is not feasible, it could easily preserve no-wrap information.
The second preservation problem relates to reassociation transforms. The following reassociation is, in general, illegal:
getelementptr inbounds (getelementptr inbounds p, i1), i2
->
getelementptr inbounds (getelementptr inbounds p, i2), i1
This is because i1 and i2 may have different signs, and changing the order of the additions may result in the arithmetic temporarily going out of bounds. This issue also applies when replacing inbounds with nusw. However, the following reassociation with nuw is legal:
getelementptr nuw (getelementptr nuw p, i1), i2
->
getelementptr nuw (getelementptr nuw p, i2), i1
Preserving no-wrap / non-negative information from the frontend
For expressions like ary[idx]
, the source language may specify that if idx
is an unsigned type, the addition of ary
and idx
cannot overflow in an unsigned sence, or that idx
cannot be negative. This kind of information is currently completely lost when translating to LLVM IR.
This information is valuable for optimizations in a number of ways. For example, it allows us to optimize expressions like ary[idx] >= ary
to true
instead of idx >= 0
.
Another example is illustrated by structures with embedded arrays:
struct SmallVec {
size_t len;
T elems[MaxLen];
};
With getelementptr nuw
, we can know that a write to elems[idx]
will not clobber the len
member, which can be critical to enable follow-on optimizations.
We will definitely be able to make use of getelementptr nuw in Rust. I’m not sure whether there is consensus that that using it for unsigned offsets is valid in C++ (the relevant wording is unclear), but at least UBSan in clang already diagnoses it, so it seems likely.
Allow frontends to experiment with different pointer arithmetic semantics
C and C++ make non-inbounds pointer arithmetic undefined behavior. Because LLVM currently only supports these semantics, other frontends are forced to adopt the same semantics.
The Rust language is considering to instead only require no-wrap for pointer arithmetic, but cannot do so unless LLVM provides a way to encode this. (Whether Rust actually ends up doing this is still unclear at this point, as there are advantages and disadvantages to both semantics – however, currently it’s not even possible to experiment with this.)
Alternatives
Use nsw instead of nusw
The nusw
terminology is adopted from SCEV and emphasizes the special wrapping semantics when adding the offset to the base, which are not quite the same as nsw
. However, the offset arithmetic itself is nsw
, so it wouldn’t be particularly wrong to use the more familiar attribute name either. I don’t have a strong opinion on this.
Use nneg instead of nuw
Instead of adding a nuw
flag, an alternative would be to add nneg
instead, which specifies that the offset must be non-negative.
If the nusw
or inbounds
flag is present, then nuw
and nneg
are equivalent. However, if nusw
is not present, then their semantics differ and neither implies the other.
In that case, I believe that nuw
enables more optimizations. For example, gep nneg p, x
does not imply that the p + x
addition is nuw, and also doesn’t imply that (gep nneg p, x) uge p
is true. (It does imply these when combined with inbounds/nusw.)
Additionally, nuw
could be useful if LLVM ever allowed allocated objects to exceed half the size of the address index space.
Don’t make inbounds imply nusw
The inbounds
property could be separated from nusw
by saying that it only requires the original pointer and the result pointer to be in bounds of an allocated object – while this has some no-wrap implications, it has less than the full set of nusw
requirements.
I don’t think doing this is a good idea, because I’m not aware of a use-case for it, while it would increase the amount of different combinations we have to reason about.
Only add the nuw flag
Most of the value of this proposal would still be captured if we only added the nuw
flag, without the nusw
flag. It is sufficient to convey key information about frontend semantics to the optimizer.
However, I still think that adding nusw
has value, because it allows us to more clearly express intent in analyses and optimizations. Rather than working with inbounds but only using its nusw properties, we could explicitly work with nusw.
nusw is also useful to enable frontends to experiment with different pointer wrapping semantics (though just nuw might also be “good enough” for that purpose).
On the other hand, adding nusw means we have to think about an extra flag when performing optimizations on GEPs.