Description
Per #73, discussion is converging on unions having no validity invariant, i.e., any bit-pattern is valid for a union type, including completely uninitialized memory. In safe Rust, however, not all valid bit-patterns can necessarily be created, because unions are checked for initialization:
// crate definer:
#[repr(C)]
pub union U { pub i: i32 };
let u: U;
unsafe { &u.i } // Error: u is not initialized
Because union values cannot be consumed in safe Rust, however, we find ourselves needing to decide which, if either, of the following two functions are unsound:
// crate producer:
pub new_u() -> definer::U { unsafe { std::mem::uninitialized() } }
// crate consumer:
pub get_i(u: definer::U) -> i32 { unsafe { u.i } }
At least one must be unsound, because another crate outside the trust boundary of either of them can call producer::get_i(consumer::get_i())
which is clearly UB.
To summarize the Zulip background, @RalfJung expressed the opinion that, by default, unions have an unspecified safety invariant and therefore, in the absence of clear documentation from the definer
crate on a safety invariant, both the producer
and consumer
crate are unsound. The producer
should not create a value which cannot be created with safe Rust, and the `consumer crate cannot assume that the value has any properties.
@Lokathor brought up that the safe transmute project is also interested in safe union field access in situations where all fields can be safely transmuted to one another. That is to say, unsafe
would not be required to use a type like union { i: i32, f: f32 }
. Since this is trivially true of one-union fields, this would mean that get_i
would be not only sound but actually safe.
This requires a safety invariant, as clearly the uninitialized union would cause safe field access to be unsound. In reply, @RalfJung suggested that for unions he would simply have suggested the trivial invariant, i.e., all values are safe, including uninitialized ones, but that this safe transmutation would suggest an additional invariant. Note that if all unions have a trivial safety invariant, then the producer
crate above would be sound.
So what actually is the safety invariant? Some options:
- Unspecified safety invariant—both
producer
anddefiner
are unsound. - An "initializedness" safety invariant—
producer
is unsound, butdefiner
is sound (and possibly later safe with compiler support), because safe code cannot produce the uninitialized value. - A more complex safety invariant based on safety of mutual field transmutation, to allow general type punning using unions.
- This could be automatic for all types with mutual transmutation; it is viable from a logical and technical point of view but may be hard to apply in practice in
unsafe
code, as it would require the coder to carefully think about whether the union has safe transmutation. It would also create a risk that changes to the union definition would silently create new UB (e.g. by adding a new field to a struct member of the union, which is normally not a breaking change). - There was at one point a suggestion that safety could be defined as "this value can be arrived at through an arbitrary sequence of safe operations." This is also logically possible, but probably even worse. And probably also useless in practice.
- @CAD97 suggested a new attribute, e.g.
#[safe_transmute_union]
, which could allow an opt-in mutual transmutation safety invariant. Thus either 1 or 2 would apply by default, but the attribute would change the invariant.
- This could be automatic for all types with mutual transmutation; it is viable from a logical and technical point of view but may be hard to apply in practice in
I am extremely partial to this last option because it makes it clear when a union does or does not support this type of invariant. And I do not like the idea of unspecified safety invariants, so I would go with option 2 for the default. (edit: see below)
I'm opening a separate issue about the offset of the field (which is why #[repr(C)]
is required in the example).