Add rustc flag to disable mutable no-aliasing optimizations?

Other features include:

  • Placement new (this is part of the proposal)
  • The possibility of destructuring out of Drop (not part of the proposal, but would be cool tbh ngl)
  • Better ergonomics for (the existing) MaybeUninit, something along the lines of MaybeUninit.place(|thing| {*thing = foo;}) -> &mut T. (unsure how useful this would be in practice. see also placement new.)

Altho we still don't get why ppl call this "typestate" we guess?

In practice, freestanding/embedded C code almost always targets GCC-compatible compilers (or often one specific GCC-compatible compiler) rather than trying to be portable.

In GCC-compatible compilers, most of the things you mentioned have corresponding built-in definitions, which have exactly or almost exactly the same semantics as the standard thing, but different names:

#define offsetof __builtin_offsetof
#define va_arg __builtin_va_arg
typedef __SIZE_TYPE__ size_t;

You could see this in two ways. On one hand, you technically don't have to use any headers. On the other hand, that's only because the compiler has the equivalent of a header built into the compiler binary, i.e. a list of names to define. In theory, Rust's choice to split this between implementation-specific "lang item" names, and a core that exposes those names, can be seen as an implementation detail.

However, there are important differences between GCC builtins and Rust core.

One, the set of GCC builtins is minimal. For the most part, only things that require special-casing by the compiler get built-in definitions. Anything that can be implemented in pure C is left to the standard library implementation, which can be replaced.

Two, GCC builtins usually satisfy my criterion of "not generat[ing] any additional assembly or trace in the binary whatsoever". They do sometimes expand to calls to helper functions, mostly to emulate specific features (atomics, floats, int128) on targets that don't natively support them. And the compiler does ship with implementations of those helper functions (libgcc for gcc, compiler-rt for clang). But using the compiler's implementation is optional. You can supply your own implementation instead; the interface is somewhat compiler-specific but is stable enough in practice. Or you can just avoid using those features and not have to ship any implementation.

I want Rust core to evolve towards being able to play the same role. By using Cargo feature flags, it should be possible to limit core to items that require special-casing by the compiler, as opposed to the random library code that's also in there. And in cases where compiler special-casing is currently inseparable from some chunk of library code, like Box, the two should be disentangled.

Regarding @jgarvin's point about UnsafeCell::get not being inlined at opt-level=1, I guess I was wrong about how reliable it is. This should be fixed. inline(always) may be good enough, but if not, maybe we can fix whatever blocks the underlying rust-intrinsic functions from being directly exported from core? Or, as @josh mentioned, use the MIR inliner for that.

Also, to be fair to Rust, my praise of GCC builtins only extends as far as pure C. The way GCC-compatible compilers handle C++ is somewhat similar, but much more annoying and inconsistent. And Rust faces some of the same challenges that C++ does: the spec defines a bunch of types that are a hybrid between standard library types and compiler magic, and it's not always clear how to cleanly separate those things.

3 Likes

In the right circumstances, competition, with its associated rewriting-from-scratch and differences in leaders and priorities, can have a role in combating stagnation.

But the "right circumstances" often include the competing projects being funded by corporations for reasons other than just wanting a better compiler.

Consider Clang/LLVM. While LLVM was originally a research project, it was developed into a production compiler that could compete with GCC... because Apple funded it, seemingly in large part in order to avoid the GPL. Still, when it emerged, it was 'disruptive' to GCC in the Silicon Valley sense of the word, offering substantial improvements in several areas, such as:

  • Much nicer diagnostics than what GCC offered at the time. (GCC has since caught up.)

  • An actual API, upon which myriad tools have been built, including Rust's own bindgen. GCC intentionally lacked an API because of fears it could be used to subvert the GPL. To be fair, those fears may well have been justified; the point is now moot, only because vendors who want to ship proprietary compilers can just use the permissively licensed Clang. But most people at the time did not realize how beneficial a compiler API could be, to the point that GCC was cutting off its nose to spite its face. (And GCC is now in the awkward position of having loosened its policy, yet still not having a good API.)

    • The backend, LLVM itself, was even more disruptive to GCC, though it's less relevant to a question about frontends.
  • Supporting cross-compilation to multiple targets from the same compiler binary. In a void, this is not such a hard thing to support, but the GCC codebase was and is structured to assume the target is known at compile time. It took a new, competing codebase to remove that assumption.

Another easy example is JavaScript engines' performance horse race. Again, though, competing JavaScript engines exist because competing browsers exist, and the reason competing browsers exist has a lot to do with power struggles, rather than just "we wanted to make a better browser". (And there aren't as many competitors as there used to be.)

2 Likes

Having multiple projects on the same frontend in similar projects has resulted in:

  • collaboration on issues related to a single language standard
  • resulted in changes to existing standards
  • resulted in the creation of new frontends
  • inspired efforts to maintain ABI compatibility between compilers, tools and libraries, etc.
1 Like

Sadly it seems this thread has been far-derailed from its original purpose. I still think that a rustc -fno-strict-aliasing would be a beneficial thing to have. Yes, it allows you to do something that the language says you probably shouldn't do, but in some cases it can make the code simpler and more elegant.

I simply don't believe that there's no way to handle references the same way we do now with the exception that aliasing optimizations are disabled entirely on user request, thus allowing you to do whatever kinds of left-handed child of darkness gore magic on your pointers and references that your heart may desire. The compiler doesn't currently even detect the UB of multiple &mut created from a pointer, nor should it. It seems to only be an issue of optimization.

This isn't a "probably shouldn't", this is a "must not". There are existing library interfaces that depend on the exclusive nature of &mut for correctness.

This isn't just an issue of missing optimizations. This is an issue of correctness; Rust with non-exclusive &mut would break existing code. We're not going to add an incompatible dialect of Rust that only works if you pass a non-standard compiler option, and that breaks the assumptions of existing Rust code.

You can have shared mutability, by labeling it appropriately, using an abstraction built atop UnsafeCell or atop raw pointers. That abstraction, by design, needs to have a different type, so that Rust code knows if it can rely on exclusive mutability. Code using &mut T may be relying on that exclusive behavior; code using MySharedMutableCell<T> isn't.

21 Likes

So, I've wanted this for a while (actally, stronger than requested. I'd like a flag that disables the emission of noalias for &T as well). It's potentially useful, even in code which intends to uphold the visible semantics of references at runtime.

Josh, your concerns about this being a fundamentally unsound compiler flag only really apply if it's used to create multiple overlapping &mut T and &T. I agree this would be undesirable. However, noalias doesn't exactly map onto the rules for &T and &mut T, and disabling it doesn't mean allowing that. As an important example, it impacts outstanding raw pointers in fairly surprising and hard to reason about ways.

Unfortunately, (at least as I understand them) the rules are such that the brief existence of a reference type effectively modifies what all outstanding raw pointers are allowed to do, even after the references no longer exist. For example:

  • Having a &mut T, even briefly, will invalidate any raw pointers to that object. (Even after the &mut T no longer exists, the raw pointers cannot be used).
  • Similarly, having &T, even briefly, also invalidates raw pointers that are later used to write to the T.

This has been a problem in the stdlib before, it makes intrusive and self-referential types very hard to write, and in general is makes unsafe code extremely unergonomic, difficult to write, and hard to follow (you pretty much have to use raw pointers for everything — and avoid creating a reference at any point).

Note that when enabling noalias for &mut T, I believe we also added a hack to only turn it on for types which are Unpin, as a heuristic for non-self-referential types (as mentioned, it causes problems for self referential types). This is not surprising to me, but is an example of why I'm not exactly a fan — it seem like such a hack will fail in many cases — Even if I had a self referential type, I wouldn't think to make it !Unpin if I didn't intend to put it in a Pin (for example, if I intended to manipulate it internally without exposing it to safe code)

Anyway, all this is to say I'm not exactly a fan of adding these aggressive aliasing annotations so pervasively. But, similarly to how many projects use -fno-strict-aliasing... I don't even want it because I intent to violate it.

I want it because I even if I did trust my ability to write unsafe code that fully upholds the rules (big if)... I definitely don't trust it for random crates on my dep graph that were written N years ago, before much research into this had been done. Unfortunately, the more time I spend with the UCG group, the more concerned I get about this...

Anyway, the optimizations it enables are honestly pretty situational anyway.

Having a &mut T, even briefly, will invalidate any raw pointers to that object.

If that's true, that's horrifying. I didn't know the rules were quite that strict. This is a pervasive problem with Rust unfortunately, where low-level behavior is poorly documented, if at all. I hope to see Rust gain an ISO standard.

I really hope you're wrong. Yeah, I still do not believe that -fno-strict-aliasing cannot be implemented soundly in Rust. Rust is a compiler and a language like any other. The borrow checker can make things complicated, but as a systems language, Rust has a real obligation to allow low-level and potentially dangerous operations. If you use -fno-strict-aliasing to create a bunch of &mut references and then pass them to code that needs the aliasing guarantees and it screws up and compiles to garbage or gives you wrong answers, that's on you. But the more likely usage is just to allow the programmer to worry less. I suppose I can make do with UnsafeCell for the mostpart, but I need to be able to have confidence that safe and unsafe code interacting will function correctly. Using pointers everywhere and reborrowing constantly is not acceptable.

Remember, we're not asking for you to change Rust fundamentally. We're asking you to add an optional compiler switch for the benefit of unsafe code, a flag that will be used by rather few developers.

Don't try and rewrite the standard library to allow aliasing etc, don't bother. I'm not asking for that. I'm asking for the option to shoot myself in the foot if I so desire.

I think this thread would be helped a lot if the folks asking for this feature could provide some reasonably real world examples of the Rust code they would like to write with this switch enabled. Presumably, the focus would be on codegen, since it seems like that's the actual problem folks care about. That is, show some Rust code today that you would like to write that has non-ideal codegen. That way, we can talk about the actual problem concretely instead of starting out at the gate with a request for a particular solution to that problem.

16 Likes

It’s a lot less bad than @tcsc made it sound. In particular:

  • Having &mut T only invalidates all the raw pointers except for the one that was used to obtain the &mut T (unless the reference didn't come from any raw pointer). That pointer may be used again once the mutable reference is no longer being used.
  • For the previous points, pointers that are merely copies of each other or resulted from each other through pointer arithmetic are considered to be the same pointer, so if one is turned into a mutable reference temporarily, then all the others aren't invalidated either. So “different” pointers are mostly just the ones that are derived independently from a reference.

Similar details come with the restrictions around &T.

I have no idea how integer to pointer casts interact with all of this.

4 Likes

This seems like a lot of gymnastics to unconditionally force an aggressive optimization that some programmers like myself clearly don't even think is always a good idea. I think -fno-strict-aliasing is a bare minimum at this point.

As @tcsc pointed out, and especially after understanding the rules a bit better, it's a matter of me not trusting myself to not violate these rules by accident. If it does happen, it will be a painful bug to find. I'd really just add the option like C and C++ already have and have had for many years -- it's one less thing to worry about. I'd concede most of my other complaints just to see this one fixed.

Also, -C llvm-args='--enable-scoped-noalias=false' doesn't actually disable all codegen noalias handling, does it?

:+1: to this. Also, to add. Box<T>, which is rustc's equivalent to std::unique_ptr<T>, uses the same ABI as *mut T by definition, so this is not an issue.

Having multiple competing frontends would be useful in filling spaces that existing frontends don't cover (and are not in scope to be covered). For example, rustc is bootstrapping heckery. If you want it on an novel platform, then you are SOL. It also, as people have metioned, a way to promote the development of a single specification that users can rely upon. Of all languages I use regularily, rust is the only one that does not have an actual specification I can rely upon.

Would you suggest that other (current or future) implementations also support this flag? In that case, you may want (or need) to figure out how exactly the flag affects the implementation of the abstract machine (what implementation-defined or unspecified behaviour does it affect, and how does it extend the language in terms of defining undefined behaviour). Hint: saying that it disables mutable noalias optimizations is not sufficient, because that inherently assumes llvm (which is an incorrect assumption). -fno-strict-aliasing works imo because it can be clearly described how it affects the implementation of the C and C++ abstract machine.

3 Likes

I would also consider what problem hiding undefined behaviour would solve, vs. actually detecting it, through a tool like miri, and if that problem warrants allowing users to enjoin certain optimizations of the implementation.

1 Like

What a horrible & misguided idea. The premise for the existence of Rust is precisely to prevent this and all the evil this entails. This is a "must not" as it contradicts Rust's core principles.

I'm honestly a bit disappointed that this thread has been dragged for so long without clearly stating this immediately. This thread is analogous to someone (also probably a C++ dev) asking the haskell community to introduce mutability. There is a plethora of paradigms and programming languages, If a person prefers a different mindset they are welcome to use the tools that agree with that mindset (C++ is the language for shooting one's own foot, not rust).

Regarding all the other points raised above, I'll just summarise them as following:

Rust isn't perfect and is actively being worked on. If there are rough edges, performance issues due to less-the-ideal codegen, etc, then these issues can (and should) be raised and worked on. @tcsc 's point above falls under this category - Rust's unsafe story is being actively worked on, e.g. @RalfJung and others have been on it for quite a while now with good progress. hopefully Rust will find a way to allow us to implement intrusive & self-referential types in a safe manner in the future. Whatever the final design we'll arrive at simply cannot be "let's just forgo Rust's principles of safety". LLVM is merely an implementation detail which could be replaced in the future with a different implementation (e.g. cranelift which is implemented in Rust) so we must not define Rust's semantics in terms of LLVM's details (noalias annotations), rather the llvm based implementation should reflect Rust's semantics.

Last point I want to make, as an aside: C++ is not "state of the art" or the base definition of computing as much as C++ devs tend to view it. For instance, Fortran predates C/C++, has similar aliasing semantics as Rust and unlike C/C++, and has actually better performance due to that. People ought to stop comparing all languages to C++ as if it is some sort of gold-standard, when really, it never was.

10 Likes

Let's see some examples of Rust code so that we have something concrete to work with.

14 Likes

I still think this is actually pretty bad. Note that other raw pointers don't get the same benefit of being able to be used again after the reference is gone.

Consider the internals of std::collections::LinkedList. I don't see how it isn't in violation here if it ever holds a &mut Node<T> to a node thats currently in the list. This would invalidate the pointers to that node that exist in other nodes. If you look at std::collections::LinkedList's implementation, it frequently makes mutable references to nodes, as mutable references are easier to work with than pointers.

Is it possible to write a correct implementation of std::collections::LinkedList? Of course, so long as you always stay in the domain of raw pointers. Unfortunately, Rust's ergonomics try very hard to force you to use references everywhere, which is quite dangerous for unsafe code.

If I'm going to be completely honest... I think we've barely begun to deal with how much existing code isn't correct under these rules, and are getting by because they haven't been enabled, and because the optimizations that make this incorrect rarely apply.

1 Like

I haven’t read that much of its implementation, but it isn’t in violation anywhere that I’ve looked so far. Feel free to point out any particular point in its code that you think might be problematic, but let me first repeat the second part of my own previous post

If you add a Node to a linked list, it gets created as a Box<Node<T>>. That box is turned into a single raw pointer (more precisely, a NonNull). That raw pointer is then copied into other nodes when those are added. In this logic, there is only multiple copies of a single pointer. If you create a reference to the Node from such a pointer, none of its copies are invalidated.

3 Likes

For reference, the rules @steffahn is describing come from Stacked Borrows (see also the links here), the current work-in-progress candidate for Rust's aliasing rules. The paper includes proofs of correctness for several kinds of optimizations rustc would like to perform. (Stacked borrows is even implemented by Miri, so it can dynamically check that you're following the rules!)

But this goes well beyond compiler optimization. User-written code also needs to be able to rely on these rules. If you write code that relies on a different ruleset (a hypothetical -fno-strict-aliasing equivalent) then that code becomes completely incompatible with the rest of the ecosystem, and vice versa. For the concept of "libraries" to continue working, all Rust code must agree on the ruleset for the various kinds of references and pointers.

Fortunately, this does not mean Rust is incapable of expressing the programs you want to write. It just means you have to write them with different types! This is the whole point of types- use different types to mean different things, don't use a compiler flag to change their meaning. No amount of discussion is going to change this.

So to echo burntsushi, if the existing set of types is insufficient for your use cases, you'll need to be more specific about the problems you're trying solve. While Rust fundamentally cannot offer a compiler flag like you suggest, it's still totally possible that the single aliasing model can be tweaked or extended in new ways (as Stacked Borrows has been several times, and will continue to be in the future). But only if we know what exactly you want to do with it!

13 Likes

cc @RalfJung

Sorry in advance for the length. I've wanted to request something like this for a while, for completely separate motivations than the initial request. I also believe what I want is subtle enough that I should explain it precisely.


In particular, my belief is that IMO, &T and &mut T should only impact what raw pointers are allowed to do for the duration of the lifetime of the &T or &mut T. This follows the previous "common knowledge", as described by the the part of the UnsafeCell where it describes the aliasing rules (to be clear, it also notes that are in flux)

If you create a safe reference with lifetime 'a (either a &T or &mut T reference) that is accessible by safe code (for example, because you returned it), then you must not access the data in any way that contradicts that reference for the remainder of 'a .

(from UnsafeCell in std::cell - Rust)

In truth, I'd like this to be more-or-less the extent of Rust's aliasing rules. Unfortunately it likely won't be, because it's weaker than (and thus incompatible with) what's assumed by our use of LLVM's noalias attributes on &T and &mut T, which we use and presumably would like to continue using. And so I'd like a way to opt out of that.

Specifically to @InfernoDeity's question about precise semantics, I don't know enough about formal models of programming languages to explain it in terms of the abstract machine. What I want is for the aliasing required implied by &T and &mut T to not have no impact after those references do not exist.

  • On compilers using LLVM, this wouldn't be sufficient to use noalias on &T or &mut T the way we do now.
    • However, if the compiler can prove a raw pointer to the type was never created, it's fine if it adds noalias.
    • ... With possible debate around "never created" vs "currently exists" — I'd like this to work for ptr-to-int too (which probably requires "never created"), but I'll concede that is entirely separate and based on the fact that I want ptr-to-int stuff to continue working...
  • This would also be required to disable any hypothetical MIR optimizations that were based on this, although IDK what they would look like.
  • However, this continues to allows libraries and code to still rely on the &T and &mut T rules they expect (to @josh's point).
    • That is, (I believe) this doesn't change the semantics of any documented aliasing rules, it changes the semantics of undocumented ones.
    • It also changes ones that would be very hard for libraries to sensibly rely on in the first place, and brings them more in line with the rules they are relying on (for example, I'm unsure how useful stable_deref_trait is under the strict LLVM-noalias-compatible version of alising rules, but it's completely fine here.
  • Doesn't allow users to create aliasing a &mut T that aliases any other reference, or a &T that gets mutable.
    • All code that does this is incorrect, will be incorrect under this flag, and has been known to be incorrect since before 1.0
    • Allowing it via a flag would fragment the language in a big way and if we're going to do that, it should be via the flag we really want: the one that disables the borrow checker (kidding, kidding).

From @steffahn's post:

I'm not convinced here, but I think the details of this will cause us to get too far into the weeds, so I'm gonna delete the bit I wrote about it, in an effort to keep this already very long reply at least stay focused. I'll try and bring up a thread on the UCG or a github issue if I find concrete problems.

I also do think that the distinction you're making is sufficiently subtle that a lot of code is liable to get it wrong, and has in the past, though.


From @yigal100's post:

@tcsc 's point above falls under this category - Rust's unsafe story is being actively worked on, e.g. @RalfJung and others have been on it for quite a while now with good progress.

While the UCG (which I actively participate in) does help here, largely it's interested in new language features to add workarounds for these cases. Consider addr_of!, one of the use cases for addr_of! is to allow code to create a pointer to a derived field without going through a reference, because that reference will invalidate other raw pointers to the type.

The problem with an approach like this is that addr_of! only was added in the latest edition. The vast majority of code that needs to use it isn't doing so, since it didn't exist. If I pull down a 3 year old data structure crate via a transitive dependency, it won't use it, and it probably never will.

For example, the stdlib had this bug for many years slice::swap violates the aliasing rules · Issue #80682 · rust-lang/rust · GitHub (which, to be clear, is an exact issue of the problem I'm discussing), and it was fixed by using addr_of_mut, a tool that was only added recently. IMO, until fairly recently, we had believed this pattern to be correct. Prior to participating in UCG, I certainly had thought so (after all, it's a raw pointer).

Whatever the final design we'll arrive at simply cannot be "let's just forgo Rust's principles of safety".

I think this is either a complete misinterpretation of my point, or not directed at me. Hard to tell.

My motivation is improved safety. That said, maybe this isn't directed at me, and I do completely disagree about wanting the ability to shoot myself in the foot.


From @burntsushi's post

That is, show some Rust code today that you would like to write that has non-ideal codegen.

I think this is a little confused about the point. A flag like this isn't for improving codegen, it's to disable a set of optimizations that were previously unexploited, but now are, and that are particularly hard to reason about.

For concrete examples, I'd like the code in slice::swap violates the aliasing rules · Issue #80682 · rust-lang/rust · GitHub to be correct as it was, without needing addr_of_mut!, as discussed above

Specifically, because while I can live with addr_of!, but I'd like a flag that allows old, more naive unsafe that I might be pulling in via a transitive dependency (written before we knew as much about this as we know now, and before there were even tools like addr_of that could have been used to avoid the problem), to not be miscompiled because the compiler started exploiting a particular kind of UB.

(Honestly, I feel like it should have been from the other side. The people who wanted to turn on a dangerous UB-exploiting optimization should have had to justify it by showing the bad codegen (this feels especialy true given that it needed a hack/heuristic to allow things like !Unpin to continue working)... But such is life).

3 Likes