Skip to content

What is the safety invariant, if any, for unions? #352

Open
@alercah

Description

@alercah

Per #73, discussion is converging on unions having no validity invariant, i.e., any bit-pattern is valid for a union type, including completely uninitialized memory. In safe Rust, however, not all valid bit-patterns can necessarily be created, because unions are checked for initialization:

// crate definer:
#[repr(C)]
pub union U { pub i: i32 };
let u: U;
unsafe { &u.i } // Error: u is not initialized

Because union values cannot be consumed in safe Rust, however, we find ourselves needing to decide which, if either, of the following two functions are unsound:

// crate producer:
pub new_u() -> definer::U { unsafe { std::mem::uninitialized() } }
// crate consumer:
pub get_i(u: definer::U) -> i32 { unsafe { u.i } }

At least one must be unsound, because another crate outside the trust boundary of either of them can call producer::get_i(consumer::get_i()) which is clearly UB.

To summarize the Zulip background, @RalfJung expressed the opinion that, by default, unions have an unspecified safety invariant and therefore, in the absence of clear documentation from the definer crate on a safety invariant, both the producer and consumer crate are unsound. The producer should not create a value which cannot be created with safe Rust, and the `consumer crate cannot assume that the value has any properties.

@Lokathor brought up that the safe transmute project is also interested in safe union field access in situations where all fields can be safely transmuted to one another. That is to say, unsafe would not be required to use a type like union { i: i32, f: f32 }. Since this is trivially true of one-union fields, this would mean that get_i would be not only sound but actually safe.

This requires a safety invariant, as clearly the uninitialized union would cause safe field access to be unsound. In reply, @RalfJung suggested that for unions he would simply have suggested the trivial invariant, i.e., all values are safe, including uninitialized ones, but that this safe transmutation would suggest an additional invariant. Note that if all unions have a trivial safety invariant, then the producer crate above would be sound.

So what actually is the safety invariant? Some options:

  1. Unspecified safety invariant—both producer and definer are unsound.
  2. An "initializedness" safety invariant—producer is unsound, but definer is sound (and possibly later safe with compiler support), because safe code cannot produce the uninitialized value.
  3. A more complex safety invariant based on safety of mutual field transmutation, to allow general type punning using unions.
    1. This could be automatic for all types with mutual transmutation; it is viable from a logical and technical point of view but may be hard to apply in practice in unsafe code, as it would require the coder to carefully think about whether the union has safe transmutation. It would also create a risk that changes to the union definition would silently create new UB (e.g. by adding a new field to a struct member of the union, which is normally not a breaking change).
    2. There was at one point a suggestion that safety could be defined as "this value can be arrived at through an arbitrary sequence of safe operations." This is also logically possible, but probably even worse. And probably also useless in practice.
    3. @CAD97 suggested a new attribute, e.g. #[safe_transmute_union], which could allow an opt-in mutual transmutation safety invariant. Thus either 1 or 2 would apply by default, but the attribute would change the invariant.

I am extremely partial to this last option because it makes it clear when a union does or does not support this type of invariant. And I do not like the idea of unspecified safety invariants, so I would go with option 2 for the default. (edit: see below)

I'm opening a separate issue about the offset of the field (which is why #[repr(C)] is required in the example).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions