On a related topic: The source-level debugging descriptors require you
to know up front what the sizeof pointer types are. Is there any hope of
the frontend remaining blissfully unaware of platform details?
I really don't know how to do this. The current debug info stuff depends on emitting size info into the IR. At this point, I don't think there is a good way around this. Improvements to the design are welcome of course.
-Chris
As I understand this, this issue and others like it all require a difficult step to be taken, which is to introduce the concept of a constant whose value is not known until code generation time or at least until the compilation target is fully known. These "late bound constants" could then be used to implement "sizeof(type)" and other constants whose value is different on different targets.
On the C++ side, you would have ConstantSize(Type) which would be usable anywhere that you could use ConstantInt, except that you can't actually inspect the integer value of the constant or use it in expressions. That part is fairly simple; It gets more complicated if you want to try defining operators such as sizeof(A) + sizeof(B); This requires the IR to support arbitrary expressions, which I don't think you want. But most of the time, you don't want to add the size of A and B, you want sizeof({A, B}), which doesn't require any special syntax other than sizeof itself.
So in other words, the frontend sees the "sizeof" constant purely as a symbolic, opaque object, while the code generator simply converts it into a ConstantInt.
This makes filling in the dwarf debugging structures relatively easy as long as you have an LLVM type reference to use as a measuring stick. In fact, I'd likely make the hypothetical "DebugBuilder" API such that most of the info was derived from an LLVM type given as a parameter, with just a few additional parameters to specify the things that cannot be determined just from looking at the the LLVM type.
Is there a similar technique that would allow calculation of the
alignment? (which is also required by the DWARF derived-type descriptor.)
There is more than one form of alignment. To find the struct field alignment of something, you can do something like:
"sizeof({i8, T}) - sizeof(T)"
Clever. I'll use that.
However, I feel that when a "trick" like this gets used enough times, that's a signal that it should be codified. Making sizeof() and alignmentof() first-class operations in the IR would have the advantage of making the generated IR clearer; And we already know that it can be done because the tricks exist.
A lot of this thinking comes out of my attempting to create (as I mentioned on the other thread) a generic "DebugBuilder", similar to IRBuilder, that pumps out source level debugging definitions. As much as possible, I want to hide details of the target machine from the user of the API. You ought to be able to hand it an LLVM type, plus a little sprinkling of source-derived metadata to go along with it, and it figures out all the metrics for you.
This is separate from the issue of size_t, which I realize is much more complex because it's not merely a machine-dependent constant, it's a machine-dependent *type*. And unlike constants, types cannot be the product of an expression in LLVM, so there's no handy trick that can be used.
There is more than one form of alignment. To find the struct field
alignment of something, you can do something like:
"sizeof({i8, T}) - sizeof(T)"
Clever. I'll use that.
However, I feel that when a "trick" like this gets used enough times,
that's a signal that it should be codified. Making sizeof() and
alignmentof() first-class operations in the IR would have the advantage
of making the generated IR clearer; And we already know that it can be
done because the tricks exist.
Sure, I'd be fine with adding them as constant exprs. Go for it.
A lot of this thinking comes out of my attempting to create (as I
mentioned on the other thread) a generic "DebugBuilder", similar to
IRBuilder, that pumps out source level debugging definitions. As much as
possible, I want to hide details of the target machine from the user of
the API. You ought to be able to hand it an LLVM type, plus a little
sprinkling of source-derived metadata to go along with it, and it
figures out all the metrics for you.
Yep.
This is separate from the issue of size_t, which I realize is much more
complex because it's not merely a machine-dependent constant, it's a
machine-dependent *type*. And unlike constants, types cannot be the
product of an expression in LLVM, so there's no handy trick that can be
used.